Creative AI
Creative AI Performances 2
Jean Oh · Isabelle Guyon
Resonator: An AI-assisted Musical Experience for Human Connection
Erin Drake Kajioka
The “Resonator” project is exploring whether a global-youth-focused 3D game experience can 1) provide a compelling way to discover new music while enabling players to express creativity (AI-illustrated playlists, music “song shapes”), resulting in greater direct engagement with music (music exploration and discovery) and human understanding of AI.
Our spatial interface creates a 3D visualization for the MuLan joint embedding model. The software enables users to express creativity through the curation of music playlists while developing a more natural human understanding of how AI represents – and algorithmically navigates – the “space” of music.
The experience is created by a group of game development engineers and designers who specialize in making 3D and 2D experiences intrinsically engaging. We are working to leverage that intrinsic engagement for the visualization, understanding, and evaluation of large models.
Emergent Rhythm — Real-time AI Generative DJ Set
Nao Tokui
“Emergent Rhythm” is an audio-visual DJ performance using real-time AI audio generation. Artist/DJ Tokui manipulates multiple models on stage to spontaneously generate rhythms and melodies. He then combines and mixes the generated audio loops to create musical developments. We employ AI audio synthesis models in real-time and faces unprecedented challenges: Everything heard during this performance is purely AI-generated sound.
As the title suggests, we focus on the musical and visual "rhythms" and recurring patterns that emerge in the interaction between multiple AI models and the artist. The accompanying visuals feature not only the periodicity over time but also the common patterns across multiple scales ranging from the extreme large-scale of the universe to the extreme small-scale of cell and atomic structures.
Aligning with the visual theme, we extracted loops from natural and man-made environmental sounds and used them as training data for audio generation. We also employ real-time timbre transfer that converts incoming audio into various singing voices, such as Buddhist chants. This highlights the diversity and commonality within the human cultural heritage.
We adapted the GAN (Generative Adversarial Networks) architecture for audio synthesis. StyleGAN models trained on spectrograms of various sound materials generate spectrograms, and vocoder GAN models (MelGAN) convert them into audio files. By leveraging GAN-based architecture, we can generate novel, constantly changing, morphing sounds similar to GAN-generated animated faces of people who don’t exist. It takes about 0.5 seconds to generate 4-second-long 2-bar loops in a batch; hence it’s faster than real-time. We also implemented GANSpace, proposed by Härkönen et al., to provide perceptual controls during the performance. GANSpace applies Principal Component Analysis (PCA) on the style vector of a trained StyleGAN model to find perceptually meaningful directions in the latent style space. Adding offsets according to these vectors allows the DJ to influence the audio generation in their desired direction.
From a DJ session, in which existing songs are selected and mixed, to a live performance that generates songs spontaneously and develops them in response to the audience's reactions: In this performance, the human DJ is expected to become an AJ, or "AI Jockey," rather than a “Disk Jockey,” taming and riding the AI-generated audio stream in real-time. With the unique morphing sounds created by AI and the new degrees of freedom that AI allows, the AI Jockey will offer audiences a unique, even otherworldly sonic experience.
How much work must the universe do, and how many dreams does it have to nurture, in order to grow a single tree? Then, how much of the universe does a forest harbor?
Entanglement, inspired by the motif of the forest, is a large-scale (16x16x4m) immersive artwork that invites spectators into a multi-sensory environment where visible and invisible worlds are interconnected and symbiotic. The artwork consists of three elements: the growth of trees through procedural modeling, generative AI that dreams images of trees and forests, and the operation of dynamic systems that connect tree roots with the mechanisms of fungi and bacteria–or of neural networks within a brain. Through the entanglement of microcosmic and simultaneous connections, it offers a sensory opportunity for contemplation and inspiration regarding ways of connecting with the world beyond ourselves, and a vision of an AI future that is fully present in its environment, as a diverse, living system in ecosystemic balance with the world. To borrow a phrase from Ursula Le Guin, our word for world is forest.
The artwork was produced using extensive custom software authored by the artists as well as SideFX Houdini and Stable Diffusion/ControlNet. Here we are using generative AI non-conventionally as a part of an ecosystem; not as an alternative artist, nor as a mere tool. As artists we are both thrilled and apprehensive of the transformative power of Generative AI, particularly regarding the role of AI in artistic creativity. Diversity is crucial in our interconnected society, however we are extending the celebration of diversity beyond human-centered society to the whole ecosystem, as we think it is vital for a vigorous future especially in the awareness of the Anthropocene era.
The central AI concept of today is the optimization of prediction based on latent compression of vast stores of mostly human-created data. This means there is all-too-human partiality embedded in the AI, running the risk of finding only what it is trained to seek, and further blinding us to the ecosystem we inhabit. For example, we found that running image generation in a feedback process is a great way to quickly reveal the biased tendencies of the trained model, which required countermeasures including additional image processing to suppress crowd-pleasing over-saturation and contrast as well as latent adjustment to prevent the production of predicted preferences (such as human subjects, text titles, cuteness, inappropriate content, and so on). It also sometimes collapses into symmetric pattern-making or complete loss of depth. This reminds us of the very real risk of positive feedback that can narrow our world and our future just like an echo chamber.
This artwork was completed through a collaboration between the artists of Artificial Nature (Haru Ji & Graham Wakefield) and researchers from the Yonsei University Intelligence Networking Lab (Chanbyoung Chae & Dongha Choi). We express gratitude to Digital Silence and Ulsan Art Museum for organizing the exhibition, and for support in workspace and hardware also provided by UCSB (MAT and the AlloSphere Research Group), SBCAST, and York University (Alice Lab).
Fusion: Landscape and Beyond
Mingyong Cheng · Xuexi Dang · Zetao Yu · Xingwen Zhao
Fusion: Landscape and Beyond is an interdisciplinary art project that explores the relationship between memory, imagination, and Artificial Intelligence (AI) embodied in the century-long practices and discourse of Shan-Shui-Hua – Chinese landscape painting. It draws inspiration from the concept of Cultural Memory, where memories are selectively retrieved and updated based on present circumstances. The project considers text-to-image AI algorithms as analogous to Cultural Memory, as they generate diverse and imaginative images using pre-existing knowledge. In response to this analogy, the project introduces the concept of "AI memory" and situates it in the culturally significant Chinese landscape painting — a synthetic embodiment of creativity derived from the artist's memory.
Diversity plays both as a driving force and major inspiration for this project, which delves deeply into addressing the bias and the necessity for cultural diversity within the realm of machine-learning generative models for creative art. Recognizing that machines inherently exhibit bias stemming from their design and predominant use, it becomes essential to acknowledge and rectify such prejudices, particularly from a cultural standpoint. The initial phase of this project involves the fine-tuning of the Stable Diffusion model. The necessity for fine-tuning stems from the imperative to infuse a deeper cultural resonance within the AI's creations, ensuring they are not just technically accurate but also emotionally and culturally symbiotic. The Stable Diffusion model, while proficient in image generation, reflects its training on a more general and globally diverse dataset. By fine-tuning it, we delicately weave the intricacies of Shan-Shui-Hua's philosophy and aesthetic principles into the AI's fabric. This process not only counters the prevailing Western-centric perspectives but also fosters a generative space where technology and traditional Chinese artistry coalesce, manifesting works that are genuinely reflective of and rooted in Chinese cultural heritage.
The final output of this project – a video animation and a collection of scroll paintings generated using our fine-tuned models – demonstrates that by inserting more culturally diverse datasets, the bias in practical machine-learning creativity is significantly lessened. They are not only manifesting the possibility of increasing diversity in machine-learning generative models, and thus leading to better performances of pre-trained models, but also indicating the stylistic nuances of Chinese landscape painting fueled by AI’s unique synthetic ability. What’s more, this fusion of past, present, and future showcases another fundamental characteristic of AI, which is its inherent competence to bypass time. Viewers are presented with a captivating experience to “travel” through the river of time, seamlessly immersing them in contemporary re-embodiment of the rich, centuries-old tradition of artistic creativity, encapsulating the timeless essence of human expression and experience.
Kiss/Crash is a multi-screen work exploring the subject of AI-imagery and representation as well as the autobiographical themes of loneliness, desire, and intimacy in the digital age. The installation consists of three individual works in a shared space, Kiss/Crash, Me Kissing Me, and Crash Me, Gently, all of which play with augmenting, inverting, and negating the iconic image of the kiss using AI image translation. Repurposing a classic Hollywood aesthetic through a queer lens, the piece reflects on the nature of images and places AI models within a history of image-production technologies meant to arouse and homogenize our desires. In the process, it reveals the logic of AI imagery and hints at how our relationship to reality will continue to be stretched and shaped by artificial representations at an accelerating pace. This piece celebrates diversity by bringing a unique queer perspective to generative AI, questioning how homogenous representations of love might haunt our AI-mediated future and how LGBT artists can playfully resist and invert that dominant narrative.
The WHOOPS! Gallery: An Intersection of AI, Creativity, and the Unusual
The WHOOPS! art gallery presents 500 AI-generated images that challenge common sense perceptions. Resulting from a collaboration between AI researchers and human designers, the collection underscores disparities in visual commonsense reasoning between machines and humans. While humans readily identify the anomalies, contemporary AI models struggle, highlighting gaps in AI understanding. This study offers insights into the evolving interplay between human cognition, art, and artificial intelligence.
Collaborative Synthscapes from Words
Nikhil Singh · Manuel Cherep · Jessica Shand
Modular synthesizers have long offered endless possibilities for sound design, but have a large number of components to patch together and parameters to tune. This makes them complex to effectively explore for many. The system we have developed, which we call CTAG (Creative Text-to-Audio Generation), invites everyone to explore these creative possibilities by imagining sounds and intuitively describing them in words, from which it controls the synthesizer's parameters to create diverse, artistic renderings.
For this project, we propose to invite attendees to co-create a set of soundscapes using CTAG. In alignment with the theme of celebrating diversity, each of the soundscapes will be oriented around a simple but thought-provoking question. Possible prompts include, but are not limited to: what is a sound that reminds you of your childhood? What is a sound that you associate with your cultural identity? What do you hear when you think of home?
This project invites members of the public to provide their own answers to each of these questions as text inputs into the system. By enabling participants to explore and play with generated sounds, it also encourages users to consider the similarities and differences that animate this community-all through sound.