Home | Registration (SIGIR 2024) | Program | Speakers | Organizers |
Multimodal data is available in many applications like e-commerce production listings, social media posts and short videos. However, existing algorithms dealing with those types of data still focus on uni-modal representation learning by vision-language alignment and cross-modal retrieval. In this workshop, we target to bring a new retrieval problem where both queries and documents are multimodal. With the popularity of vision language modeling, large language models (LLMs), retrieval augmented generation (RAG), and multimodal LLM, we see a lot of new opportunities for multimodal representation and retrieval tasks. This event will be a comprehensive half-day workshop focusing on the subject of multimodal representation and retrieval. The agenda includes keynote speeches, oral presentations, and an interactive panel discussion.
Submissions of short papers must be in English, in PDF format, and be at most 4 pages (including figures, tables, proofs, appendixes, acknowledgments, and any content except references) in length, with unrestricted space for references, in the current ACM two-column conference format. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word). ACM's CCS concepts and keywords are required for review.
For LaTeX, the following should be used:
\documentclass[sigconf,natbib=true,anonymous=true{acmart}]
Submissions must be anonymous and should be submitted electronically via EasyChair:
Abstract to come...
Abstract to come...