The Chosen One: Consistent Characters in Text-to-Image Diffusion Models
Given a text prompt describing a character, our method distills a representation that enables consistent depiction of the same character in novel contexts.
I am a Ph.D. candidate at the School of Computer Science and Engineering at the Hebrew University of Jerusalem, under the joint supervision of Prof. Dani Lischinski and Dr. Ohad Fried.
Furthermore, I am a Research Intern at NVIDIA Research. Before that, I was a Research Intern at Google AI (Google Research) during 2023 and at Meta AI Research (FAIR) during the summer of 2022.
My research interests include machine learning, computer vision, and generative models. More specifically, I am interested in developing new tools for content synthesis and editing --- known popularly as Generative AI.
Given a text prompt describing a character, our method distills a representation that enables consistent depiction of the same character in novel contexts.
Given a single image with multiple concepts, annotated by loose segmentation masks, our method can learn a distinct token for each concept, and use natural language guidance to re-synthesize the individual concepts or combinations of them in various contexts.
Given a NeRF scene, our pipeline trains a NeRF generator model guided by a similarity loss defined by a language-image model such as CLIP, to synthesize a new object inside a user-specified ROI.
We suggest a new method for text-to-image generation using open-vocabulary scene control.
We present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask.
We introduce a solution for performing local (region-based) edits in generic natural images, based on a natural language description along with an ROI mask.
We tackle the problem of model merging, given two constraints that often come up in the real world: (1) no access to the original training data, and (2) without increasing the size of the neural network.