Poster
FineStyle: Fine-grained Controllable Style Personalization for Text-to-image Models
Gong Zhang · Kihyuk Sohn · Meera Hahn · Humphrey Shi · Irfan Essa
A few-shot fine-tuning of text-to-image (T2I) generation models enables people to create unique artworks in their own style using natural languages without requiring extensive prompt engineering. However, fine-tuning with only a handful, as little as one, of image-text paired data prevents fine-grained control of style attributes at generation. In this paper, we propose a FineStyle, a few-shot fine-tuning paradigm that allows enhanced controllability for style personalized text-to-image generation. To overcome the lack of training data for fine-tuning, we propose a novel concept-oriented data scaling that amplifies the number of image-text pair, each of which focuses on different concepts (e.g., objects) in the style reference image. Moreover, we identify the benefit of parameter-efficient adapter tuning of key and value kernels of cross-attention layers. Extensive experiments show the effectiveness of FineStyle at following fine-grained text prompts and delivering impressive visual quality faithful to the specified style, measured both in terms of CLIP similarity scores and by human raters.
Live content is unavailable. Log in and register to view live content