Results 291 to 300 of about 14,757,385 (333)
Some of the next articles are maybe not open access.

GR-3 Technical Report

arXiv.org
We report our recent progress towards building generalist robot policies, the development of GR-3. GR-3 is a large-scale vision-language-action (VLA) model.
Chi-Lam Cheang   +20 more
semanticscholar   +1 more source

RoboBrain 2.0 Technical Report

arXiv.org
We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments.
Mingyu Cao   +50 more
semanticscholar   +1 more source

Multilingual E5 Text Embeddings: A Technical Report

arXiv.org
This technical report presents the training methodology and evaluation results of the open-source multilingual E5 text embedding models, released in mid-2023.
Liang Wang   +5 more
semanticscholar   +1 more source

Step-Audio 2 Technical Report

arXiv.org
This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
Boyong Wu   +100 more
semanticscholar   +1 more source

Ovis2.5 Technical Report

arXiv.org
We present Ovis2.5, a successor to Ovis2 designed for native-resolution visual perception and strong multimodal reasoning. Ovis2.5 integrates a native-resolution vision transformer that processes images at their native, variable resolutions, avoiding the
Shiyin Lu   +41 more
semanticscholar   +1 more source

HunyuanImage 3.0 Technical Report

arXiv.org
We present HunyuanImage 3.0, a native multimodal model that unifies multimodal understanding and generation within an autoregressive framework, with its image generation module publicly available. The achievement of HunyuanImage 3.0 relies on several key
Siyu Cao   +73 more
semanticscholar   +1 more source

InternLM2 Technical Report

arXiv.org
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging.
Zheng Cai   +99 more
semanticscholar   +1 more source

Qwen3Guard Technical Report

arXiv.org
As large language models (LLMs) become more capable and widely used, ensuring the safety of their outputs is increasingly critical. Existing guardrail models, though useful in static evaluation settings, face two major limitations in real-world ...
Hai Zhao   +42 more
semanticscholar   +1 more source

Ovis-U1 Technical Report

arXiv.org
In this report, we introduce Ovis-U1, a 3-billion-parameter unified model that integrates multimodal understanding, text-to-image generation, and image editing capabilities.
Guo-Hua Wang   +11 more
semanticscholar   +1 more source

Stable LM 2 1.6B Technical Report

arXiv.org
We introduce StableLM 2 1.6B, the first in a new generation of our language model series. In this technical report, we present in detail the data and training procedure leading to the base and instruction-tuned versions of StableLM 2 1.6B.
Marco Bellagente   +18 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy