Poster+Demo Session
in
Workshop: Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation
Three-modal guidance for symbolic music generation: melody, structure, texture
Daniel Lucht · David Leins · Dimitri von Rütte · Alexandra Moringen
The vision of this work is a flexible co-creation of music between human and a trained model that can be used with or without domain knowledge. Building upon previous work, the transformer-based FIGARO framework, we propose a symbolic music generation that takes up three separate guiding modalities: a melody, structural piece description termed expert description, and music texture. Our approach aims to enable a composer to try out combinations of different melodies, expert descriptions, and textures. FIGARO is capable of generating music based on a structural expert description generated with domain knowledge, and a learned representation of a music piece. The description part of the input is generated for each bar, and provides a multitude of features, such as mean pitch, chords, note density, etc. The learned representation is generated for each bar as a whole. The main contribution of this work is a more extensive modularisation of the input to the model, i.e. the concept of explicit separation of the input into three above-mentioned distinct modalities commonly used in music composition and symbolic description of the musical works: melody, domain knowledge-driven description of the piece, and texture guiding the feel of the music. We demonstrate our preliminary results with novel model-based implementation of a piece, provided a melody, a bar-wise description and a multi-track accompaniment.