Creative AI
Creative AI Session 4
East Ballroom C
Jean Oh · Marcelo Coelho
AI Nüshu
Chuyan Xu · Yuying Tang · Yanran Wang · Ze Gao · Chang Hee Lee · Ali Asadipour · Yuqian Sun · Zhijun Pan
Nüshu (女书), Also called "Women's Script," is a unique language created and used exclusively by women in Hunan Province, China, for centuries to communicate Chinese in a world where they were denied formal education.Can AI learn from ancient Chinese women and create their own secret language?"AI Nüshu," an interactive art project that merges computational linguistics with the legacy of Nüshu. This project trains AI agents to imitate illiterate women in ancient China to create a new language system, symbolizing the defiance of patriarchal constraints and the emergence of language in a non-Western, feminist context.This is the first art project to interpret Nüshu from a computational linguistics perspective. Two AI agents simulates how ancient women developed a unique language under the constraints of a patriarchal society. This cultural phenomenon resonates with the emergence of non-human machine language under human authority, both metaphorically and practically. Essentially, we integrate cultural phenomena into an AI system.Contrary to the predefined rules like Morse code, Markov chains, and fictional constructed languages, AI Nüshu evolves organically from the machine's environmental observations and feedback, mirroring the natural formation of human languages. The meaning of language in vector space gradually emerges into the symbols in the system.As this new language is decipherable and learnable by humans, especially Chinese speakers, it inherently challenges the existing paradigm where humans are the linguistic authorities and machines are the learners.
AI TrackMate: Finally, Someone Who Will Give Your Music More Than Just “Sounds Great!”
Chia-Ho Hsiung · LU RONG CHEN · Yen-Tung Yeh · Yi-lin Jiang · Bo-Yu Chen
The rise of "bedroom producers" has democratized music creation, while challenging producers to objectively evaluate their work. To address this, we present AI TrackMate, an LLM-based music chatbot designed to provide constructive feedback on music productions. By combining LLMs' inherent musical knowledge with direct audio track analysis, AI TrackMate offers production-specific insights, distinguishing it from text-only approaches. Our framework integrates a Music Analysis Module, an LLM-Readable Music Report, and Music Production-Oriented Feedback Instruction, creating a plug-and-play, training-free system compatible with various LLMs and adaptable to future advancements. We demonstrate AI TrackMate's capabilities through an interactive web interface and present findings from a pilot study with a music producer. By bridging AI capabilities with the needs of independent producers, AI TrackMate offers round-the-clock, nuanced feedback, potentially transforming the creative process and skill development in music production. This system addresses the growing demand for objective self-assessment tools in the evolving landscape of independent music production.
“Ambiguity in AI: The Paradox of Gender Neutrality” is a multi-faceted art installation that investigates the complexities of gender neutrality in artificial intelligence. This multifaceted work combines a conceptual video with an interactive auditory experience to explore how AI-generated voices handle gender ambiguity. The 3-minute video component contrasts societal strides toward recognizing non-binary identities with the difficulties AI faces in achieving true gender neutrality. It includes interviews with individuals sharing their experiences with non-binary identities, the introduction of the X gender designation on US passports, and visual depictions of AI systems like Google Assistant attempting to present neutral identities. The video also critiques the default female voices used in AI assistants and introduces Q, the world's first genderless voice assistant. Concluding segments pose questions about whether our technological advances reflect our progress or reveal underlying limitations. Accompanying the video is an interactive auditory experience featuring a curated collection of 100 daily conversations. These conversations were generated by text-to-speech models and evaluated by LGBTQ individuals to identify those perceived as gender-neutral. Attendees will engage with these dialogues through headphones and audio equipment, providing feedback on their perceptions of gender neutrality. This data will be compared with the original evaluations, offering insights into how diverse demographics perceive gender-neutral AI voices and potentially guiding future developments in voice generation technology. By engaging directly with the theme of ambiguity, this artwork prompts attendees to consider whether AI can genuinely reflect societal advancements toward non-binary and gender-neutral identities. The interactive component enriches the exploration by bridging theoretical insights with practical, user-based evaluations.
"break me, genAI" is an audiovisual composition that explores the complex interplay between human perception, musical context, and AI-driven visual generation. This work challenges traditional approaches to audiovisual art by incorporating multiple layers of both human input and machine learning processes.The piece begins with manually drawn MIDI parameter curves representing abstract musical qualities like "grittiness" and "rattle," as perceived by the artist. These curves drive procedurally generated visuals inspired by the synesthetic works of Kandinsky and Klee, creating a tight coupling between audio and visual elements. This initial layer is then processed through StyleGAN and further transformed using Stable Diffusion and AnimateDiff, guided by prompts derived from the original visual inspirations.Unlike conventional Music Information Retrieval-based visualizations that respond similarly to diverse musical inputs, this work aims to capture the nuanced human experience of listening. It acknowledges how musical elements evolve in perception throughout a piece, such as the difference between a bass line's first appearance and its 50th repetition.The resulting artwork embodies ambiguity on multiple levels. It raises questions about the influence of each generative layer on the final output, the preservation of the artist's original perceptual input through AI processing, and the extent to which the AI-generated visuals reflect the aesthetic of the inspiring artists."break me, genAI" invites viewers to contemplate the nature of artistic creation in the age of AI. It challenges us to consider how machine learning can be used not just as a tool for audio-reactive visuals, but as a means to explore and express the subjective, context-dependent nature of musical perception. The work suggests that the integration of human insight with AI capabilities can lead to more nuanced and emotionally resonant audiovisual experiences.
This article evaluates how creative uses of machine learning can address three adjacent terms: ambiguity, uncertainty and indeterminacy. Through the progression of these concepts it reflects on increasing ambitions for machine learning as a creative partner, illustrated with research from Unit 21 at the Bartlett School of Architecture, UCL. Through indeterminacy are potential future approaches to machine learning and design.
Diff4Steer: Steerable Diffusion Prior for Generative Music Retrieval with Semantic Guidance
Joonseok Lee · Judith Yue Li · Xuchan Bao · Timo I. Denk · Kun Su · Fei Sha · Zhong Yi Wan · Dima Kuzmin
Current music retrieval systems often rely on deterministic seed embedding to represent user preference, limiting their ability to capture users' diverse and uncertain retrieval needs. To address this, we propose Diff4Steer, a novel generative retrieval framework that leverages generative models to synthesize potential directions for exploration, represented by "oracle" seed embeddings. These embeddings capture the distribution of user preferences given retrieval queries, enabling more flexible and creative music discovery. Diff4Steer's lightweight diffusion-based generative models provide a statistical prior on the target modality (audio), which can be steered by image or text inputs to generate samples in the audio embedding space. These samples are then used to retrieve candidates via nearest neighbor search. Our framework outperforms deterministic regression methods and LLM-based generative retrieval baseline in terms of retrieval and ranking metrics, demonstrating its effectiveness in capturing user preferences and providing diverse and relevant recommendations. We include appendix and website for demonstration in the supplementary materials.
Learning to Move, Learning to Play, Learning to Animate
erika roos · Han Zhang · Mingyong Cheng · Yuemeng Gu · Sophia Sun
"Learning to Move, Learning to Play, Learning to Animate" is a cross-disciplinary multimedia performance that challenges the human-centric perspective, offering a new way to experience the world. This performance features robots made from natural materials, using technologies like AI-generated visuals, real-time bio-feedback, and electroacoustic sound.Inspired by ecologist David Abram’s concept of the “more-than-human world,” our work explores intelligences beyond the human, questioning whether intelligence is exclusive to humans or can exist in wood, stone, metal, and silicon. The performance examines coexistence with technology and nature without dominance.Integrating organic robotics and real-time generative AI, the piece blends human creativity with synthetic intelligence. AI-generated visuals, processed using StreamDiffusion-TD and Kinect within Touch Designer, enable evolving interactions between performers and their digital counterparts. Bio-feedback systems translate plant signals into dynamic visual and auditory elements, fostering a symbiotic relationship between human, machine, and nature.The piece introduces ambiguity through the interconnectedness of human, ecological, and technological entities, blurring traditional distinctions. Shadows play a crucial role, symbolizing perception and reality, blending natural and synthetic elements to highlight unseen connections. Machines, often seen merely as our creations, exhibit the agency to create art and influence our understanding of the world.Through this immersive experience, we invite the audience to explore learning, interaction, and perception within the more-than-human world. Our performance aims to redefine the boundaries between the natural and the artificial, fostering a deeper understanding of the collective intelligence shared by humans, technology, and the environment.
Ørigin is an AI experimental short film that delves into the complex interplay between humanity, nature, and technology. Utilizing advanced AI tools such as Midjourney, Runway, Krea, and Luma Dream Machine, the film embarks on a profound exploration of human evolution, imagining a future where the artificial and the organic coexist harmoniously.The film presents a visually captivating narrative that traces the evolution of life—from the emergence of the first cells to the rise of complex life forms and the development of human civilization. As technology advances rapidly, Ørigin invites viewers to reflect on our shifting relationship with nature, moving from reverence to exploitation. The film emphasizes the consequences of this imbalance and underscores the urgent need to restore harmony between nature and technology.Engaging with the theme of ambiguity on multiple levels, Ørigin challenges conventional distinctions between the natural and artificial, prompting viewers to question what is "real" or "authentic." The film raises questions about the direction of human evolution and our role in shaping it, while also exploring the role of AI as both a tool and an active participant in the creative process.Through the innovative use of generative AI, the film's creators craft a unique visual style that is open to multiple interpretations, encouraging active viewer engagement. The narrative unfolds in a non-linear, abstract manner, allowing for diverse interpretations and personalizing the viewing experience.By embracing ambiguity in both form and content, Ørigin fosters ongoing dialogue about our evolving relationship with artificial intelligence and the physical world. This innovative approach has distinguished the film, earning it the Best Art Direction award at the Civitai Project Odyssey AI competition.
How will AI come to understand the syntactic logic of the world? What is a synthetic process that knows how to put together instance to instance? What does it … look like? An ordinary day is recounted on a train in rural Japan. The sequence is repeated over and over as it gets pulled apart and destroyed under various computational operations. An artificial video muscle emerges from the skeletons of a scene and reaches to stitch together the next frame, trying to patch up the distorted fragments of visual information.
Redefining Artistic Boundaries: A Real-Time Interactive Painting Robot for Musicians
Richard Savery · Justin Baird
This paper explores the intersection of artificial intelligence (AI) and creative expression through the development of a real-time interactive painting robot designed to accompany live musical performances. Using a robotic arm controlled by audio inputs processed through Max/MSP and communicated via Open Sound Control (OSC) messages, the system generates visual art that responds dynamically to music. This collaboration between human musicians and AI challenges traditional notions of creativity, authorship, and control, raising questions about the role of machines in the creative process. By analyzing the outcomes of various performance iterations, we examine the ambiguity in creative agency and the potential for AI to redefine artistic boundaries, offering insights into the evolving relationship between human and machine in the arts.
Rhythm Bots: A Sensitive Improvisational Environment
Kathryn Wantlin · Naomi Leonard · Jane Cox · Dan Trueman · María Santos · Isla Xi Han · Sarah Witzman · Tess James
Rhythm Bots is an actively controlled kinetic sculpture and art-making exploration of research on collective intelligence and collective behavior. The sculpture comprises a group of gentle, rhythmically rotating robots that propagate movement changes across a network in response to one another and to human audience members who sit with or move around them. In May 2024 at Princeton University’s Wallace Theater, we created an emergent environment, sensitive to stimuli, composed of twelve rhythm bots and human audience members. Each bot independently controlled its own movement according to its evolving “opinion” states while also activating lights and sound. Collective artificial intelligence was implemented through a distributed, decision-making model known as nonlinear opinion dynamics (NOD). Yolov3 and DeepSort were used to detect and track individual humans, whose presence “excited” the attention parameters of nearby bots and sped up neuron-like dispersion of signals across the communication network.The piece draws on the expressiveness of the underlying dynamics model to encourage exploration of the ambiguity in the human-robot feedback loop. Audience members were naturally motivated to learn the boundaries of what emergent patterns they could induce, developing their own notions of “real” and “non-real” influence on the environment. The overall result was a peaceful, evolving environment featuring emergent synchronization modified by input and intermittently interrupted by input-triggered dynamic movement, light, and sound events. Connecting lights and sound to the space and to the robot behavior made the research more visible and audible, while enhancing the meditative nature of the environment. Rhythm Bots provides a creative platform for further art-making, novel human-machine physical interaction experiments involving movement, light, and sound, and continuing opportunities to use intelligent machines to impart in people positive feelings of wellbeing.
SketcherX: AI-Driven Interactive Robotic drawing with Diffusion model and Vectorization Techniques
Nojun Kwak · Jookyung Song · Mookyoung Kang
We introduce SketcherX, a novel robotic system for personalized portrait drawing through interactive human-robot engagement. Unlike traditional robotic art systems that rely on analog printing techniques, SketcherX captures and processes facial images to produce vectorized drawings in a distinctive, human-like artistic style. The system comprises two 6-axis robotic arms : a face robot, which is equipped with a head-mounted camera and Large Language Model (LLM) for real-time interaction, and a drawing robot, utilizing a fine-tuned Stable Diffusion model, ControlNet, and Vision-Language models for dynamic, stylized drawing. Our contributions include the development of a custom Vector Low Rank Adaptation model (LoRA), enabling seamless adaptation to various artistic styles, and integrating a pair-wise fine-tuning approach to enhance stroke quality and stylistic accuracy. Experimental results demonstrate the system's ability to produce high-quality, personalized portraits within two minutes, highlighting its potential as a new paradigm in robotic creativity. This work advances the field of robotic art by positioning robots as active participants in the creative process, paving the way for future explorations in interactive, human-robot artistic collaboration.
Sketchy Collections: Exploring Digital Museum Collections by Drawing via CLIP
Rebecca Fiebrink · Polo Sologub
In recent years, digital museum collections have made it possible for everyone to discover cultural heritage (CH) online. However, that does not mean that they are engaging or fun for casual users to explore. In this paper, we develop a web interface that lets users search and compare three museum collections by drawing images. We describe our approach of using CLIP as a feature extraction model for a Sketch-Based Image Retrieval (SBIR) model based on museum tags. Through qualitative experiments and a user study, we demonstrate that the model performs well in a CH context with interesting results and that the interface enables playful search and serendipitous discoveries.
Tesseract.art, a real-time generative AI interactive painting robot for multidisciplinary art
Richard Savery · Justin Baird
Tesseract.art is a real-time generative AI-driven interactive painting robot designed for multidisciplinary art creation. This project pushes the boundaries of creative artificial intelligence, transforming how human expression in one form can be reinterpreted and manifested into another physical medium. The intention behind Tesseract.art is to explore and develop AI as an instrument, leading to the emergence of new art forms and novel relationships between creators and their artistic processes.
Towards a ‘Non-Universal‘ Architecture: Designing with Others through Gestures (1:1 Scale)
Ioana Drogeanu
This project explores the dynamic relationship between human individuality and cultural heritage through the innovative use of digital tools, centred on the theme of “Ambiguity.” By employing virtual reality (VR) and machine learning (ML), it transforms cultural gestures into architectural elements, allowing participants to perform these gestures at a 1:1 scale in VR. The gestures are captured as 3D meshes, which are then interpolated across various body dimensions and gestural typologies using a 3D DCGAN, creating a diverse array of architectural fragments. This process fosters a novel form of co-authorship, blending human input with algorithms.The resulting structures are organised using SOMs and positioned through Python scripts, aligning the gestural meshes according to their spatial and temporal contexts. These architectural elements form community spaces such as playgrounds, café hubs and performance areas, reflecting the unique rhythms and dimensions of their users. This approach challenges the traditional notion of architecture as static, proposing instead that buildings can be dynamic, evolving expressions of the communities they serve.The project also delves into the ambiguity of authorship, as the integration of VR and ML creates a blurred line between human and machine contributions. It raises questions about the true ‘author’ of the design, as personal gestures are algorithmically transformed into architectural elements, blending individual expressions with community representation. Moreover, the research examines the complexities of cultural representation, where digitising and modifying cultural gestures through AI both preserves and transforms the original heritage, challenging conventional ideas of authenticity. It addresses the challenges of using data-driven models to represent diverse populations, emphasising the need to balance statistical generalisations with the unique realities of individual experiences.
The latent space of many generative models are rich in unexplored valleys and mountains. The majority of tools used for exploring them are so far limited to Graphical User Interfaces (GUIs). While specialized hardware can be used for this task, we show that a simple feature extraction of pre-trained Convolutional Neural Networks (CNNs) from a live RGB camera feed does a very good job at manipulating the latent space with simple changes in the scene, with vast room for improvement. We name this new paradigm \textit{Visual-reactive Interpolation}, and the full code can be found at https://github.com/PDillis/stylegan3-fun.