2nd Workshop on AVGenL


Room 327, Oct 20 afternoon, in Conjunction with ICCV 2025, Honolulu, USA

Back to Main Page

Overview

CALL FOR POSTERS
We will call for extended abstract submission of works already accepted by recent CVPR/ICCV/ECCV/ICML/ICLR/NeurIPS or IJCV/TPAMI/TMLR. Additionally, we will invite accepted papers from both the main conference (ICCV 2025) and the IJCV special issue on audio-visual generation for the presentations.

CALL FOR DEMOS
We will also call for demos from the industry. Currently we do not accept industrial demos anymore.


In this workshop, we aim to shine a spotlight on this exciting yet underinvestigated field by prioritizing new approaches in audio-visual generation, as well as covering a wide range of topics related to audio-visual learning, where the convergence of auditory and visual signals unlocks a plethora of opportunities for advancing creativity, understanding, and also machine perception. We hope our workshop can bring together researchers, practitioners, and enthusiasts from diverse disciplines in both academia and industry to delve into the latest developments, challenges, and breakthroughs in audio-visual generation and learning. The workshop will mainly cover but not limited to the following topics:

  • Audio-visual generation, including joint audio-visual or cross-modal generation
    • Image/video-driven audio generation
    • Audio-driven visual media generation
    • Dancing video and talking head animation
  • Audio-visual foundation models
  • Audio-visual representation learning and transfer learning
  • Audio-visual learning applications
    • Application on scene understanding
    • Application on localization
  • Audio-visual benchmarks, such as datasets and evaluation metrics
  • Ethical considerations in audio-visual research


Program

Schedule Oct 20, 2025

1:00 - 1:10 pm

Organizers

Opening remarks

1:10 - 1:50 pm

Keynote Speaker

William T. Freeman

1:50 - 2:30 pm

Keynote Speaker

Tae-Hyun Oh

2:30 - 2:45 pm

Sponsor

Abaka AI

2:45 - 3:15pm

Poster and Break

Papers accepted by ICCV25 and IJCV SI

3:15 - 3:30 pm

Industrial Demo

Veo 3 by Mohammad Babaeizadeh from Google Deepmind

3:30 - 4:10 pm

Keynote Speaker

Chenliang Xu (online)

4:10 - 4:50 pm

Keynote Speaker

Andrew Owens

4:50 - 5:00 pm

Organizers

Closing remarks


Poster Session

  • How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes (ICCV 2025). Mahnoor Fatima Saad, Ziad Al-Halah.
  • Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation (ICCV 2025). Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu.
  • AV-Flow: Transforming Text to Audio-Visual Human-like Interactions (ICCV 2025). Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong, Michael Zollhoefer, Dimitris Samaras, Alexander Richard.
  • TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis (ICCV 2025). Tri Ton, Ji Woo Hong, Chang D. Yoo.
  • Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations (ICCV 2025). Jeong Hun Yeo*, Minsu Kim*, Chae Won Kim, Stavros Petridis, and Yong Man Ro.
  • What's Making That Sound Right Now? Video-centric Audio-Visual Localization (ICCV 2025). Hahyeon choi, Junhoo Lee, Nojun Kwak.
  • GaussianSpeech: Audio-Driven Personalized 3D Gaussian Avatars (ICCV 2025). Shivangi Aneja, Artem Sevastopolsky, Tobias Kirschstein, Justus Thies, Angela Dai, Matthias Nießner.
  • AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs (ICCV 2025). Sanjoy Chowdhury, Hanan Gani, Nishit Anand, Sayan Nag, Ruohan Gao, Mohamed Elhoseiny, Salman Khan, Dinesh Manocha.
  • FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait (ICCV 2025). Taekyung Ki, Dongchan Min, Gyeongsu Chae.
  • VGGSounder: Audio-Visual Evaluations for Foundation Models (ICCV 2025). Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke.
  • Taming Data and Transformers for Audio Generation (IJCV SI). Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Vicente Ordonez.
  • Hearing and Seeing Through CLIP: A Framework for Self-Supervised Sound Source Localization (IJCV SI). Sooyoung Park, Arda Senocak, Joon Son Chung.
  • Audio-Guided Video Scene Editing (IJCV SI). Kaixin Shen, Ruijie Quan, Linchao Zhu, Dong Zheng, Jun Xiao, Yi Yang.

Demo Session Showcase the latest products and innovations

Sponsors