Sat 6:30 a.m. - 6:40 a.m.
|
Opening remarks
(
Opening remarks
)
>
SlidesLive Video
|
Brian Kulis
🔗
|
Sat 6:40 a.m. - 7:00 a.m.
|
Computer Audition Disrupted 2.0: The Foundation Models Era
(
Invited talk
)
>
SlidesLive Video
|
Bjoern Schuller
🔗
|
Sat 7:00 a.m. - 7:20 a.m.
|
Explainable AI for Audio via Virtual Inspection Layers
(
Oral
)
>
SlidesLive Video
|
Johanna Vielhaben · Sebastian Lapuschkin · Grégoire Montavon · Wojciech Samek
🔗
|
Sat 7:20 a.m. - 7:40 a.m.
|
Self-Supervised Speech Enhancement using Multi-Modal Data
(
Oral
)
>
SlidesLive Video
|
Yu-Lin Wei · Rajalaxmi Rajagopalan · Bashima Islam · Romit Roy Choudhury
🔗
|
Sat 7:40 a.m. - 8:10 a.m.
|
A multi-view approach for audio-based speech emotion recognition
(
Invited talk
)
>
SlidesLive Video
|
Dimitra Emmanouilidou
🔗
|
Sat 8:10 a.m. - 8:50 a.m.
|
Coffee break
|
🔗
|
Sat 8:50 a.m. - 9:10 a.m.
|
Audio Language Models
(
Invited talk
)
>
SlidesLive Video
|
Neil Zeghidour
🔗
|
Sat 9:10 a.m. - 9:30 a.m.
|
Zero-shot audio captioning with audio-language model guidance and audio context keywords
(
Oral
)
>
SlidesLive Video
|
Leonard Salewski · Stefan Fauth · A. Sophia Koepke · Zeynep Akata
🔗
|
Sat 9:30 a.m. - 10:00 a.m.
|
Lark: A Multimodal Foundation Model for Music
(
Invited talk
)
>
SlidesLive Video
|
Rachel Bittner
🔗
|
Sat 10:00 a.m. - 11:30 a.m.
|
Lunch break
|
🔗
|
Sat 11:30 a.m. - 1:00 p.m.
|
Poster & Demo Session
(
Poster Session
)
>
|
🔗
|
Sat 1:00 p.m. - 1:30 p.m.
|
Coffee break
|
🔗
|
Sat 1:30 p.m. - 2:00 p.m.
|
Uninformative Gradients: Optimisation pathologies in differentiable digital signal processing
(
Invited talk
)
>
SlidesLive Video
|
Ben Hayes
🔗
|
Sat 2:00 p.m. - 2:20 p.m.
|
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
(
Oral
)
>
SlidesLive Video
|
Ge Zhu · Yutong Wen · Marc-André Carbonneau · Zhiyao Duan
🔗
|
Sat 2:20 p.m. - 2:40 p.m.
|
Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech
(
Oral
)
>
SlidesLive Video
|
Mohamed Osman · Tamer Nadeem · Ghada khoriba
🔗
|
Sat 2:40 p.m. - 3:00 p.m.
|
Audio Personalization through Human-in-the-loop Optimization
(
Oral
)
>
SlidesLive Video
|
Rajalaxmi Rajagopalan · Yu-Lin Wei · Romit Roy Choudhury
🔗
|
Sat 3:00 p.m. - 3:20 p.m.
|
Multi-channel speech enhancement for moving sources
(
Invited talk
)
>
SlidesLive Video
|
Shoko Araki
🔗
|
-
|
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
(
Poster
)
>
|
Ge Zhu · Yutong Wen · Marc-André Carbonneau · Zhiyao Duan
🔗
|
-
|
Explainable AI for Audio via Virtual Inspection Layers
(
Poster
)
>
|
Johanna Vielhaben · Sebastian Lapuschkin · Grégoire Montavon · Wojciech Samek
🔗
|
-
|
Audio classification with Dilated Convolution with Learnable Spacings
(
Poster
)
>
link
|
Ismail Khalfaoui Hassani · Timothée Masquelier · Thomas Pellegrini
🔗
|
-
|
Creative Text-to-Audio Generation via Synthesizer Programming
(
Poster
)
>
|
Nikhil Singh · Manuel Cherep · Jessica Shand
🔗
|
-
|
Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation
(
Poster
)
>
|
Ye Bai · Chenxing Li · Xiaorui Wang · Yuanyuan Zhao · Hao Li
🔗
|
-
|
Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion
(
Poster
)
>
|
Xueyao Zhang · Yicheng Gu · Haopeng Chen · Zihao Fang · Lexiao Zou · Liumeng Xue · Zhizheng Wu
🔗
|
-
|
Diffusion Models as Masked Audio-Video Learners
(
Poster
)
>
|
Elvis Nunez · Yanzi Jin · Mohammad Rastegari · Sachin Mehta · Maxwell Horton
🔗
|
-
|
InstrumentGen: Generating Sample-Based Musical Instruments From Text
(
Poster
)
>
link
|
Shahan Nercessian · Johannes Imort
🔗
|
-
|
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
(
Poster
)
>
|
Edward Fish · Jon Weinbren · Andrew Gilbert
🔗
|
-
|
Composing and Validating Large-Scale Datasets for Training Open Foundation Models for Audio
(
Poster
)
>
|
Marianna Nezhurina · Ke Chen · Yusong Wu · Tianyu Zhang · Haohe Liu · Yuchen Hui · Taylor Berg-Kirkpatrick · Shlomo Dubnov · Jenia Jitsev
🔗
|
-
|
Unsupervised Musical Object Discovery from Audio
(
Poster
)
>
|
Joonsu Gha · Vincent Herrmann · Benjamin F. Grewe · Jürgen Schmidhuber · Anand Gopalakrishnan
🔗
|
-
|
Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data
(
Poster
)
>
link
|
Tashi Namgyal · Alexander Hepburn · Raul Santos-Rodriguez · Valero Laparra · Jesús Malo
🔗
|
-
|
Self-Supervised Speech Enhancement using Multi-Modal Data
(
Poster
)
>
|
Yu-Lin Wei · Rajalaxmi Rajagopalan · Bashima Islam · Romit Roy Choudhury
🔗
|
-
|
Improved sound quality human-inspired DNN-based audio applications
(
Poster
)
>
|
Chuan Wen · Sarah Verhulst
🔗
|
-
|
Audio Personalization through Human-in-the-loop Optimization
(
Poster
)
>
|
Rajalaxmi Rajagopalan · Yu-Lin Wei · Romit Roy Choudhury
🔗
|
-
|
Synthia's Melody: A Benchmark Framework for Unsupervised \\Domain Adaptation in Audio
(
Poster
)
>
|
Harry Coppock · Chia-Hsin Lin
🔗
|
-
|
Zero-shot audio captioning with audio-language model guidance and audio context keywords
(
Poster
)
>
|
Leonard Salewski · Stefan Fauth · A. Sophia Koepke · Zeynep Akata
🔗
|
-
|
AttentionStitch: How Attention Solves the Speech Editing Problem
(
Poster
)
>
|
Antonios Alexos · Pierre Baldi
🔗
|
-
|
MusT3: Unified Multi-Task Model for Fine-Grained Music Understanding
(
Poster
)
>
|
Martin Kukla · Minz Won · Yun-Ning Hung · Duc Le
🔗
|
-
|
Benchmarks and deep learning models for localizing rodent vocalizations in social interactions
(
Poster
)
>
|
Ralph Peterson · Aramis Tanelus · Aman Choudhri · Violet Ivan · Aaditya Prasad · David Schneider · Dan Sanes · Alex Williams
🔗
|
-
|
Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech
(
Poster
)
>
|
Mohamed Osman · Tamer Nadeem · Ghada khoriba
🔗
|
-
|
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
(
Poster
)
>
|
13 presenters
Ilaria Manco · Benno Weck · Seungheon Doh · Yixiao Zhang · Dmitry Bogdanov · Yusong Wu · Ke Chen · Philip Tovstogan · Emmanouil Benetos · Elio Quinton · George Fazekas · Juhan Nam · Minz Won
🔗
|
-
|
ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts
(
Poster
)
>
|
Vishruth Veerendranath · Vibha Masti · Utkarsh Gupta · Hrishit Chaudhuri · Gowri Srinivasa
🔗
|
-
|
Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates
(
Poster
)
>
|
Stefan Lattner · Marco Pasini
🔗
|
-
|
Deep Generative Models of Music Expectation
(
Poster
)
>
|
Ninon Lizé Masclef · Andy Keller
🔗
|
-
|
mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks
(
Poster
)
>
link
|
Christos Plachouras · Dmitry Bogdanov · Pablo Alonso-Jiménez
🔗
|