Poster
in
Workshop: NeurIPS 2023 Workshop on Machine Learning for Creativity and Design
HARP: Bringing Deep Learning to the DAW with Hosted, Asynchronous, Remote Processing
Hugo Flores Garcia · Christodoulos Benetatos · Patrick O'Reilly · Aldo Aguilar · Zhiyao Duan · Bryan Pardo
Deep learning models have the potential to transform how artists interact with audio across a range of creative applications. While digital audio workstations (DAWs) like Logic or Pro Tools are the most popular software environment for producing audio, state-of-the-art deep learning models are typically available as Python repositories or web demonstrations (e.g., Gradio apps). Attempts to bridge this divide have focused on deploying lightweight models as DAW plug-ins that run real-time, locally on the CPU. This often requires significant modifications to the models, and precludes large compute-heavy models and alternative interaction paradigms (e.g., text-to-audio). To bring state-of-the-art models into the hands of artistic creators, we release HARP, a free Audio Random Access (ARA) plug-in for DAWs. HARP supports [h]osted, [a]synchronous, [r]emote [p]rocessing with de≈ep learning models by routing audio from the DAW through Gradio endpoints. Through HARP, Gradio-compatible models hosted on the web (e.g., on Hugging Face Spaces) can become directly useable within the DAW. Using our API, developers can define interactive controls and audio processing logic within their Gradio endpoint. A sound artist can then enter the model's URL into a dialog box on the HARP plugin and the plug-in interface will automatically populate controls, prepare routing, and render any processed audio. Thus, sound artists can create and modify audio using deep learning models in-DAW, maintaining an unbroken creative workflow.