Poster
Towards Visual Text Design Transfer Across Languages
Yejin Choi · Jiwan Chung · Sumin Shim · Giyeong Oh · Youngjae Yu
The art of visual text design serves as a potent medium for conveying themes, emotions, and atmospheres within a multimodal context. From compelling film posters to evocative album covers, the fusion of typography and imagery transcends the communicative potential of mere words. Nevertheless, the translation of a visual style's essence across disparate writing systems presents a substantial challenge for computational models. Can generative models accurately comprehend the intricacies of design and effectively transfer the intended aesthetic across linguistic boundaries? In this study, we introduce Multimodal Style Translation (MuST-Bench), a pioneering task designed to evaluate the efficacy of visual text translation across diverse writing systems. Our studies with MuST-Bench reveal that current visual text generation models struggle with the proposed task due to the inadequacy of textual descriptions in conveying visual design. We introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions. SIGIL enhances image generation models through three innovations: glyph latent for multilingual settings, pretrained VAEs for stable style guidance, and an OCR model with reinforcement learning feedback for optimizing readable character generation. SIGIL surpasses baselines in style consistency and legibility while maintaining visual similarity, unlike description-based methods. We plan to release our benchmark and model to inspire further research in multilingual visual text understanding and generation.
Live content is unavailable. Log in and register to view live content