Results 321 to 330 of about 1,322,987 (378)
Some of the next articles are maybe not open access.
Annual Meeting of the Association for Computational Linguistics
Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains.
Chaoqun He +13 more
semanticscholar +1 more source
Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains.
Chaoqun He +13 more
semanticscholar +1 more source
R1-Onevision: Advancing Generalized Multimodal Reasoning Through Cross-Modal Formalization
IEEE International Conference on Computer VisionLarge Language Models have demonstrated remarkable reasoning capability in complex textual tasks. However, multimodal reasoning, which requires integrating visual and textual information, remains a significant challenge.
Yi Yang +11 more
semanticscholar +1 more source
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Neural Information Processing SystemsWe introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and ...
Shengbang Tong +13 more
semanticscholar +1 more source
Show-o2: Improved Native Unified Multimodal Models
arXiv.orgThis paper presents improved native unified multimodal models, \emph{i.e.,} Show-o2, that leverage autoregressive modeling and flow matching. Built upon a 3D causal variational autoencoder space, unified visual representations are constructed through a ...
Jinheng Xie +2 more
semanticscholar +1 more source
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
arXiv.orgBy extending the advantage of chain-of-thought (CoT) reasoning in human-like step-by-step processes to multimodal contexts, multimodal CoT (MCoT) reasoning has recently garnered significant research attention, especially in the integration with ...
Yaoting Wang +6 more
semanticscholar +1 more source
Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmark
International Conference on Machine LearningThe ability to organically reason over and with both text and images is a pillar of human intelligence, yet the ability of Multimodal Large Language Models (MLLMs) to perform such multimodal reasoning remains under-explored.
Yunzhuo Hao +6 more
semanticscholar +1 more source
2000
Chapter 5 discusses multimodal therapy, and how the multimodal approach provides a framework that facilitates systematic treatment selection in a broad-based, comprehensive, and yet highly focused manner. It covers how it respects science and data-driven findings, and it endeavors to use empirically supported methods when possible.
openaire +1 more source
Chapter 5 discusses multimodal therapy, and how the multimodal approach provides a framework that facilitates systematic treatment selection in a broad-based, comprehensive, and yet highly focused manner. It covers how it respects science and data-driven findings, and it endeavors to use empirically supported methods when possible.
openaire +1 more source
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
arXiv.orgWe introduce VisualPRM, an advanced multimodal Process Reward Model (PRM) with 8B parameters, which improves the reasoning abilities of existing Multimodal Large Language Models (MLLMs) across different model scales and families with Best-of-N (BoN ...
Weiyun Wang +14 more
semanticscholar +1 more source
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset
arXiv.orgRecent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant
Ke Wang +5 more
semanticscholar +1 more source
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
International Conference on Learning RepresentationsWe present a unified transformer, i.e., Show-o, that unifies multimodal understanding and generation. Unlike fully autoregressive models, Show-o unifies autoregressive and (discrete) diffusion modeling to adaptively handle inputs and outputs of various ...
Jinheng Xie +9 more
semanticscholar +1 more source

