Results 11 to 20 of about 2,484,152 (338)

Bootstrap Latent Representations for Multi-modal Recommendation [PDF]

open access: goldThe Web Conference, 2023
This paper studies the multi-modal recommendation problem, where the item multi-modality information (e.g., images and textual descriptions) is exploited to improve the recommendation accuracy.
Xin Zhou   +7 more
openalex   +3 more sources

MMBench: Is Your Multi-modal Model an All-around Player? [PDF]

open access: yesEuropean Conference on Computer Vision, 2023
Large vision-language models (VLMs) have recently achieved remarkable progress, exhibiting impressive multimodal perception and reasoning abilities. However, effectively evaluating these large VLMs remains a major challenge, hindering future development ...
Yuanzhan Liu   +11 more
semanticscholar   +1 more source

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
With the rapid development of Multi-modal Large language Models (MLLMs), a number of diagnostic bench-marks have recently emerged to evaluate the comprehension capabilities of these models.
Kunchang Li   +11 more
semanticscholar   +1 more source

MaPLe: Multi-modal Prompt Learning [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well ...
Muhammad Uzair Khattak   +4 more
semanticscholar   +1 more source

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities [PDF]

open access: yesConference on Empirical Methods in Natural Language Processing, 2023
Multi-modal large language models are regarded as a crucial step towards Artificial General Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT.
Dong Zhang   +6 more
semanticscholar   +1 more source

Visual Prompt Multi-Modal Tracking [PDF]

open access: yesComputer Vision and Pattern Recognition, 2023
Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based ...
Jiawen Zhu   +4 more
semanticscholar   +1 more source

UniXcoder: Unified Cross-Modal Pre-training for Code Representation [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2022
Pre-trained models for programming languages have recently demonstrated great success on code intelligence. To support both code-related understanding and generation tasks, recent works attempt to pre-train unified encoder-decoder models.
Daya Guo   +5 more
semanticscholar   +1 more source

MDETR - Modulated Detection for End-to-End Multi-Modal Understanding [PDF]

open access: yesIEEE International Conference on Computer Vision, 2021
Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of interest from the image. However, this crucial module is typically used as a black box, trained independently of the downstream task and on a fixed vocabulary of ...
Aishwarya Kamath   +5 more
semanticscholar   +1 more source

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning [PDF]

open access: yesNeural Information Processing Systems, 2022
We present modality gap, an intriguing geometric phenomenon of the representation space of multi-modal models. Specifically, we show that different data modalities (e.g.
Weixin Liang   +4 more
semanticscholar   +1 more source

MIMIC-IT: Multi-Modal In-Context Instruction Tuning [PDF]

open access: yesarXiv.org, 2023
High-quality instructions and responses are essential for the zero-shot performance of large language models on interactive natural language tasks. For interactive vision-language tasks involving intricate visual scenes, a large quantity of diverse and ...
Bo Li   +7 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy