Results 281 to 290 of about 769,446 (361)

MuscleX-DI: an integrated data analysis package for X-ray scanning diffraction imaging experiments. [PDF]

open access: yesJ Synchrotron Radiat
Madhurapantula RS   +8 more
europepmc   +1 more source

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

arXiv.org
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations).
Yujia Qin   +34 more
semanticscholar   +1 more source

ScreenSpot-Pro: GUI Grounding for Professional High-Resolution Computer Use

ACM Multimedia
Recent advancements in Multi-modal Large Language Models (MLLMs) have led to significant progress in developing GUI agents for general tasks such as web browsing and mobile phone use.
Kaixin Li   +7 more
semanticscholar   +1 more source

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

arXiv.org
Existing efforts in building Graphical User Interface (GUI) agents largely rely on the training paradigm of supervised fine-tuning on Large Vision-Language Models (LVLMs).
Run Luo, Lu Wang, Wanwei He, Xiaobo Xia
semanticscholar   +1 more source

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

Annual Meeting of the Association for Computational Linguistics
Graphical User Interface (GUI) agents are designed to automate complex tasks on digital devices, such as smartphones and desktops. Most existing GUI agents interact with the environment through extracted structured data, which can be notably lengthy (e.g.
Kanzhi Cheng   +6 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy