Overview
CALL FOR POSTERS
We will call for extended abstract submission of works already accepted by recent CVPR/ICCV/ECCV/ICML/ICLR/NeurIPS or IJCV/TPAMI/TMLR.
Additionally, we will invite accepted papers from both the main conference (ICCV 2025) and the IJCV special issue on audio-visual generation for the presentations.
CALL FOR DEMOS
We will also call for demos from the industry. Currently we do not accept industrial demos anymore.
In this workshop, we aim to shine a spotlight on this exciting yet underinvestigated field by prioritizing new approaches in audio-visual generation, as well as covering a wide range of topics related to audio-visual learning, where the convergence of auditory and visual signals unlocks a plethora of opportunities for advancing creativity, understanding, and also machine perception. We hope our workshop can bring together researchers, practitioners, and enthusiasts from diverse disciplines in both academia and industry to delve into the latest developments, challenges, and breakthroughs in audio-visual generation and learning. The workshop will mainly cover but not limited to the following topics:
- Audio-visual generation, including joint audio-visual or cross-modal generation
- Image/video-driven audio generation
- Audio-driven visual media generation
- Dancing video and talking head animation
- Audio-visual foundation models
- Audio-visual representation learning and transfer learning
- Audio-visual learning applications
- Application on scene understanding
- Application on localization
- Audio-visual benchmarks, such as datasets and evaluation metrics
- Ethical considerations in audio-visual research
Keynote Speakers
Organizers
Program
Schedule Oct 20, 2025
Organizers
Opening remarks
Keynote Speaker
William T. Freeman
Keynote Speaker
Tae-Hyun Oh
Sponsor
Abaka AI
Poster and Break
Papers accepted by ICCV25 and IJCV SI
Industrial Demo
Veo 3 by Mohammad Babaeizadeh from Google Deepmind
Keynote Speaker
Chenliang Xu (online)
Keynote Speaker
Andrew Owens
Organizers
Closing remarks
Poster Session
- How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes (ICCV 2025). Mahnoor Fatima Saad, Ziad Al-Halah.
- Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation (ICCV 2025). Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu.
- AV-Flow: Transforming Text to Audio-Visual Human-like Interactions (ICCV 2025). Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong, Michael Zollhoefer, Dimitris Samaras, Alexander Richard.
- TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis (ICCV 2025). Tri Ton, Ji Woo Hong, Chang D. Yoo.
- Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations (ICCV 2025). Jeong Hun Yeo*, Minsu Kim*, Chae Won Kim, Stavros Petridis, and Yong Man Ro.
- What's Making That Sound Right Now? Video-centric Audio-Visual Localization (ICCV 2025). Hahyeon choi, Junhoo Lee, Nojun Kwak.
- GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars (ICCV 2025). Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner.
- AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs (ICCV 2025). Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed Elhoseiny, Salman Khan, Dinesh Manocha.
- FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait (ICCV 2025). Taekyung Ki, Dongchan Min, Gyeongsu Chae.
- VGGSounder: Audio-Visual Evaluations for Foundation Models (ICCV 2025). Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke.
- Taming Data and Transformers for Audio Generation (IJCV SI). Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Vicente Ordonez.
- Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization (IJCV SI). Sooyoung Park, Arda Senocak, Joon Son Chung.
- Audio-Guided Video Scene Editing (IJCV SI). Kaixin Shen, Ruijie Quan, Linchao Zhu, Dong Zheng, Jun Xiao, Yi Yang.