Results 331 to 340 of about 1,322,987 (378)
Some of the next articles are maybe not open access.

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

International Conference on Machine Learning
Combining discrete and continuous data is an important capability for generative models. We present Discrete Flow Models (DFMs), a new flow-based model of discrete data that provides the missing link in enabling flow-based generative models to be applied
Andrew Campbell   +4 more
semanticscholar   +1 more source

BLINK: Multimodal Large Language Models Can See but Not Perceive

European Conference on Computer Vision
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations.
Xingyu Fu   +9 more
semanticscholar   +1 more source

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Annual Meeting of the Association for Computational Linguistics
This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark. MMMU-Pro rigorously assesses multimodal models' true understanding and reasoning capabilities through a three-step ...
Xiang Yue   +13 more
semanticscholar   +1 more source

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Computer Vision and Pattern Recognition
We introduce Janus, an autoregressive framework that unifies multimodal understanding and generation. Prior research often relies on a single visual encoder for both tasks, such as Chameleon.
Chengyue Wu   +10 more
semanticscholar   +1 more source

Multimodality

AbstractThis chapter considers multimodality in Casey O’Callaghan’s strict sense of that term: a perceptual representation’s object is represented neither by a single sense modality nor merely as a collection of features each of which is represented by a single modality. By definition, multimodal representation cannot be simply a case of layering.
Denise Newfield, Sarah Crinall
openaire   +2 more sources

HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale

arXiv.org
The rapid development of multimodal large language models (MLLMs), such as GPT-4V, has led to significant advancements. However, these models still face challenges in medical multimodal capabilities due to limitations in the quantity and quality of ...
Junying Chen   +11 more
semanticscholar   +1 more source

Hallucination of Multimodal Large Language Models: A Survey

arXiv.org
This survey presents a comprehensive analysis of the phenomenon of hallucination in multimodal large language models (MLLMs), also known as Large Vision-Language Models (LVLMs), which have demonstrated significant advancements and remarkable abilities in
Zechen Bai   +6 more
semanticscholar   +1 more source

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

arXiv.org
Existing open-source multimodal large language models (MLLMs) generally follow a training process involving pre-training and supervised fine-tuning.
Weiyun Wang   +10 more
semanticscholar   +1 more source

A Multilevel Multimodal Fusion Transformer for Remote Sensing Semantic Segmentation

IEEE Transactions on Geoscience and Remote Sensing
Accurate semantic segmentation of remote sensing data plays a crucial role in the success of geoscience research and applications. Recently, multimodal fusion-based segmentation models have attracted much attention due to their outstanding performance as
Xianping Ma   +3 more
semanticscholar   +1 more source

Multimodal Interfaces

Artificial Intelligence Review, 1996
Alex Waibel   +3 more
openaire   +1 more source

Home - About - Disclaimer - Privacy