Results 321 to 330 of about 1,322,987 (378)
Some of the next articles are maybe not open access.

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems

Annual Meeting of the Association for Computational Linguistics
Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains.
Chaoqun He   +13 more
semanticscholar   +1 more source

R1-Onevision: Advancing Generalized Multimodal Reasoning Through Cross-Modal Formalization

IEEE International Conference on Computer Vision
Large Language Models have demonstrated remarkable reasoning capability in complex textual tasks. However, multimodal reasoning, which requires integrating visual and textual information, remains a significant challenge.
Yi Yang   +11 more
semanticscholar   +1 more source

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Neural Information Processing Systems
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and ...
Shengbang Tong   +13 more
semanticscholar   +1 more source

Show-o2: Improved Native Unified Multimodal Models

arXiv.org
This paper presents improved native unified multimodal models, \emph{i.e.,} Show-o2, that leverage autoregressive modeling and flow matching. Built upon a 3D causal variational autoencoder space, unified visual representations are constructed through a ...
Jinheng Xie   +2 more
semanticscholar   +1 more source

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

arXiv.org
By extending the advantage of chain-of-thought (CoT) reasoning in human-like step-by-step processes to multimodal contexts, multimodal CoT (MCoT) reasoning has recently garnered significant research attention, especially in the integration with ...
Yaoting Wang   +6 more
semanticscholar   +1 more source

Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark

International Conference on Machine Learning
The ability to organically reason over and with both text and images is a pillar of human intelligence, yet the ability of Multimodal Large Language Models (MLLMs) to perform such multimodal reasoning remains under-explored.
Yunzhuo Hao   +6 more
semanticscholar   +1 more source

Multimodal therapy.

2000
Chapter 5 discusses multimodal therapy, and how the multimodal approach provides a framework that facilitates systematic treatment selection in a broad-based, comprehensive, and yet highly focused manner. It covers how it respects science and data-driven findings, and it endeavors to use empirically supported methods when possible.
openaire   +1 more source

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

arXiv.org
We introduce VisualPRM, an advanced multimodal Process Reward Model (PRM) with 8B parameters, which improves the reasoning abilities of existing Multimodal Large Language Models (MLLMs) across different model scales and families with Best-of-N (BoN ...
Weiyun Wang   +14 more
semanticscholar   +1 more source

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

arXiv.org
Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant
Ke Wang   +5 more
semanticscholar   +1 more source

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

International Conference on Learning Representations
We present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various ...
Jinheng Xie   +9 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy