Overview
CALL FOR REVIEWERS
We're looking for experienced reviewers to help shape the latest research in this exciting domain! If you’re interested, please fill out this form.
CALL FOR PAPERS
We're accepting high-quality papers on the topic of audio-visual generation (and learning).
Flyer is here
The ability to simulate and reason about the physical world is central to human intelligence. We perceive our surroundings and construct mental models that allow us to internally simulate possible outcomes, enabling reasoning, planning, and action — what we might call “world simulators”. Similarly, developing a world simulator is crucial for building human-like AI systems that can interact effectively with dynamic and complex environments. Recent research has shown that high-fidelity video generation models are a promising path toward building such comprehensive and efficient world simulators. However, the physical world is inherently multimodal. Human perception mostly relies not only on visual stimuli but also on sound. Sound often conveys critical information complementing what we can see, providing a richer and more nuanced understanding of the environment. To create world simulators capable of mimicking human-like perception and reasoning, it is crucial to develop coherent audiovisual generative models. Despite this, most modern approaches focus on vision-only or language-visual modalities, often with less focus on understanding and generating integrated audiovisual signals.
This special issue aims to spotlight the exciting yet underexplored field of audio-visual generation as a key stepping stone towards achieving multi-modal world simulators. Our goal is to prioritize innovative approaches that explore this multimodal integration, advancing both the generation and analysis of audio-visual content. In addition to these approaches, we also aim to explore the broader impacts of this research. Moreover, in line with the classical concept of analysis-by-synthesis, advances in audiovisual generation can foster improvements in analysis and understanding methods, reinforcing the symbiotic relationship between these two areas. This research is not merely about content creation; it holds the potential to form a fundamental building block for more advanced, human-like AI systems.
Scopes
Guest Editors
The team is led by Tae-Hyun Oh
Submission Guidelines
Please submit via IJCV Editorial Manager: www.editorialmanager.com/visi. Choose SI: Audio-Visual Generation from the dropdown. Refer to the official site for the details.
FQA
Important Dates
Upon request, we have extended the submission deadline to May 2 '25. Papers submitted early will be processed on a rolling basis.
Manuscript Submission Deadline | May 02 '25 11:59 PM AoE | |
First Review Notification | June 25 '25 11:59 PM AoE | |
Revised Manuscript Submission | July 10 '25 11:59 PM AoE | |
Final Review Notification | August 10 '25 11:59 PM AoE | |
Final Manuscript Submission | September 20 '25 11:59 PM AoE |