Audio-Visual Generation

About

In recent years, we have witnessed significant advancements in the field of visual generation which have molded the research landscape. However, in a world where information is conveyed through a rich tapestry of sensory experiences, the fusion of audio and visual modalities has become much more essential for understanding and replicating the intricacies of human perception and diverse real-world applications. Indeed, the integration of audio and visual information has emerged as a critical area of research in computer vision and machine learning, having numerous applications across various domains, from immersive gaming environments to lifelike simulations for medical training, such as multimedia analysis, virtual reality, advertisement and cinematic application.

Despite these compelling motivations, research dedicated to understanding and generating audio-visual modalities remains relatively underexplored compared to traditional vision-only approaches in our community. Our mission is to foster progress in this promising and impactful domain by inspiring deeper engagement and providing dedicated platforms, such as international workshops, for discussion and the exchange of the latest advancements.