The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Given a text prompt describing a character, our method distills a representation that enables consistent depiction of the same character in novel contexts.
I am a Ph.D. candidate at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the joint supervision of Prof. Dani Lischinski and Dr. Ohad Fried.
In addition, I am a Research Intern at Google AI (Google Research). Previously, I spent the summer of 2022 at Meta AI Research (FAIR) as a Research Scientist Intern.
My research interests include machine learning, computer vision, and generative models. More specifically, I am interested in developing new tools for content synthesis and editing --- known popularly as Generative AI.
Given a text prompt describing a character, our method distills a representation that enables consistent depiction of the same character in novel contexts.
Given a single image with multiple concepts, annotated by loose segmentation masks, our method can learn a distinct token for each concept, and use natural language guidance to re-synthesize the individual concepts or combinations of them in various contexts.
Given a NeRF scene, our pipeline trains a NeRF generator model guided by a similarity loss defined by a language-image model such as CLIP, to synthesize a new object inside a user-specified ROI.
We suggest a new method for text-to-image generation using open-vocabulary scene control.
We present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask.
We introduce a solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask.
We tackle the problem of model merging, given two constraints that often come up in the real world: (1) no access to the original training data, and (2) without increasing the size of the neural network.