Home | Registration (ICCV 2025) | Program | Keynote Speakers | Organizers | Past Events |
Multimodal representation learning is central to modern AI, enabling applications across retrieval, generation, RAG, reasoning, agentic AI, and embodied intelligence. With the growing ubiquity of multimodal data—from e-commerce listings to social media and video content—new challenges arise in multimodal retrieval, where both queries and indexed content span multiple modalities. This task requires deeper semantic understanding and reasoning, especially at scale, where data complexity and noise become significant hurdles. Following the success of our first edition, the second Multimodal Representation and Retrieval Workshop at ICCV 2025 will continue to foster progress in this critical area. The half-day event will feature keynote talks, an invited talk, and oral and poster presentations.
Our objective with this workshop is to capture the interest of researchers in the emerging field of multimodal retrieval and representation learning. As users increasingly use LLM based agents to interact with the world, the tools needed to retrieve relevant information will need to evolve to serve agents as well as human users. We anticipate that the workshop will serve as a catalyst for establishing a dedicated community focused on this topic. By highlighting the novelty and significance of the problem, we aim to attract researchers who are eager to explore and contribute to this field. We invite original research & industrial application papers that present research on learning multimodal representations and building multimodal retrieval systems.
Submissions of papers must be in English, in PDF format, and at most 8 pages (including figures, tables, proofs, appendixes, acknowledgments, and any content except references) in length, with unrestricted space for references, in the ICCV style. Please download the ICCV 2025 Author Kit for detailed formatting instructions.
Papers that are not properly anonymized, or do not use the template, or have less than four pages or more than eight pages (excluding references) will be rejected without review. We expect at least one author from each submission to be available as a reviewer.
Submissions should be submitted electronically:
The accepted papers will appear in ICCV proceedings by default unless the authors notify the organizers (email: mrr-2025-iccv@googlegroups.com ) separately before Jul 3 (11:59 pm, PST).
Time | Session |
---|---|
8:30 - 8:35 am | Opening Remarks |
8:35 - 8:55 am | Invited Talk - Roi Herzig |
8:55 - 9:35 am | Keynote: Cordelia Schmid |
9:35 - 9:50 am | Rate–Distortion Limits for Multimodal Retrieval: Theory, Optimal Codes, and Finite-Sample Guarantees |
9:50 - 10:35 am | Coffee Break & Poster Session |
10:35 - 10:50 am | MIND-RAG: Multimodal Context-Aware and Intent-Aware Retrieval-Augmented Generation for Educational Publications |
10:50 - 11:30 am | Keynote: Jianwei Yang |
11:30 - 11:45 am | Refining Skewed Perceptions in Vision-Language Contrastive Models through Visual Representations |
11:45 - 12:25 pm | Keynote: Kristen Grauman |
12:25 - 12:30 pm | Closing Remarks |
For any questions, please email mrr-2025-iccv@googlegroups.com .