Multimodal Representation and Retrieval [MRR 2025]

Multimodal data is available in many applications like e-commerce production listings, social media posts and short videos. However, existing algorithms dealing with those types of data still focus on uni-modal representation learning by vision-language alignment and cross-modal retrieval. In this workshop, we target to bring a new retrieval problem where both queries and documents are multimodal. With the popularity of vision language modeling, large language models (LLMs), retrieval augmented generation (RAG), and multimodal LLM, we see a lot of new opportunities for multimodal representation and retrieval tasks. This event will be a comprehensive workshop focusing on the subject of multimodal representation and retrieval.

Our objective with this workshop is to capture the interest of researchers in the emerging field of multimodal retrieval and representation learning. As users increasingly use LLM based agents to interact with the world, the tools needed to retrieve relevant information will need to evolve to serve agents as well as human users. We anticipate that the workshop will serve as a catalyst for establishing a dedicated community focused on this topic. By highlighting the novelty and significance of the problem, we aim to attract researchers who are eager to explore and contribute to this field. We invite original research & industrial application papers that present research on learning multimodal representations and building multimodal retrieval systems.

Submission Guidelines

Submissions of papers must be in English, in PDF format, and at most 8 pages (including figures, tables, proofs, appendixes, acknowledgments, and any content except references) in length, with unrestricted space for references, in the ICCV style. Please download the ICCV 2025 Author Kit for detailed formatting instructions.

Papers that are not properly anonymized, or do not use the template, or have less than four pages or more than eight pages (excluding references) will be rejected without review. We expect at least one author from each submission to be available as a reviewer.

Submissions should be submitted electronically:

Submit via OpenReview

The accepted papers will appear in ICCV proceedings by default unless the authors notify the organizers (email: mrr-2025-iccv@googlegroups.com ) separately before Jul 3 (11:59 pm, PST).

Important dates for submissions to MRR 2025

Workshop paper submission due date: ~~June 3~~ June 10, 2025 (11:59 pm, AoE)
Workshop paper acceptance notification: ~~June 25~~ July 2, 2025
Workshop day: October 19/20, 2025

Topics includes but not limited to

Multimodal representation learning and retrieval, such as
- Multimodal embeddings learning and fusion
- Multimodal representation for reasoning
- Learning with noisy labels
- Multimodal query representation
- Multimodal query understanding
- Multimodal query suggestion
- Ranking algorithms for multimodal retrieval
Dataset, such as
- New dataset for multimodal representation/reasoning/retrieval
- Ways to synthesize data
Applications of Multimodal Retrieval, such as
- Multimodal retrieval in RAG
- Multimodal retrieval in Agentic AI
- Multimodal retrieval in search engine
- Multimodal retrieval in recommendation system
- Multimodal retrieval in Ads
- Multimodal retrieval in Chatbot
- Multimodal query suggestion
- Multimodal retrieval in Robotics