Results 291 to 300 of about 14,757,385 (333)
Some of the next articles are maybe not open access.
arXiv.org
We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model.
Chi-Lam Cheang +20 more
semanticscholar +1 more source
We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model.
Chi-Lam Cheang +20 more
semanticscholar +1 more source
RoboBrain 2.0 Technical Report
arXiv.orgWe introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments.
Mingyu Cao +50 more
semanticscholar +1 more source
Multilingual E5 Text Embeddings: A Technical Report
arXiv.orgThis technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.
Liang Wang +5 more
semanticscholar +1 more source
arXiv.org
This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Boyong Wu +100 more
semanticscholar +1 more source
This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Boyong Wu +100 more
semanticscholar +1 more source
arXiv.org
We present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the
Shiyin Lu +41 more
semanticscholar +1 more source
We present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the
Shiyin Lu +41 more
semanticscholar +1 more source
HunyuanImage 3.0 Technical Report
arXiv.orgWe present HunyuanImage 3.0, a native multimodal model that unifies multimodal understanding and generation within an autoregressive framework, with its image generation module publicly available. The achievement of HunyuanImage 3.0 relies on several key
Siyu Cao +73 more
semanticscholar +1 more source
arXiv.org
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging.
Zheng Cai +99 more
semanticscholar +1 more source
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging.
Zheng Cai +99 more
semanticscholar +1 more source
arXiv.org
As large language models (LLMs) become more capable and widely used, ensuring the safety of their outputs is increasingly critical. Existing guardrail models, though useful in static evaluation settings, face two major limitations in real-world ...
Hai Zhao +42 more
semanticscholar +1 more source
As large language models (LLMs) become more capable and widely used, ensuring the safety of their outputs is increasingly critical. Existing guardrail models, though useful in static evaluation settings, face two major limitations in real-world ...
Hai Zhao +42 more
semanticscholar +1 more source
arXiv.org
In this report, we introduce Ovis-U1, a 3-billion-parameter unified model that integrates multimodal understanding, text-to-image generation, and image editing capabilities.
Guo-Hua Wang +11 more
semanticscholar +1 more source
In this report, we introduce Ovis-U1, a 3-billion-parameter unified model that integrates multimodal understanding, text-to-image generation, and image editing capabilities.
Guo-Hua Wang +11 more
semanticscholar +1 more source
Stable LM 2 1.6B Technical Report
arXiv.orgWe introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B.
Marco Bellagente +18 more
semanticscholar +1 more source

