Results 321 to 330 of about 769,446 (361)
Some of the next articles are maybe not open access.

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Annual Meeting of the Association for Computational Linguistics
Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality ...
Qiushi Sun   +14 more
semanticscholar   +1 more source

AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents

Annual Meeting of the Association for Computational Linguistics
AI agents have drawn increasing attention mostly on their ability to perceive environments, understand tasks, and autonomously achieve goals. To advance research on AI agents in mobile scenarios, we introduce the Android Multi-annotation EXpo (AMEX), a ...
Yuxiang Chai   +8 more
semanticscholar   +1 more source

DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning

Conference on Empirical Methods in Natural Language Processing
Grounding natural language queries in graphical user interfaces (GUIs) poses unique challenges due to the diversity of visual elements, spatial clutter, and the ambiguity of language.
Hang Wu   +6 more
semanticscholar   +1 more source

���������������������� �������������������������� ���������������� ������ ���������������� �������������� ���� ���� ���������� ������ GUI:

2010
The operation of multiple systems is not stable. This issue is treated using a new system consisting of a compensator and a unit feedback. The closed loop system is with inner stability, and the system outputs have desirable behaviour. No undesirable reactions/interrelations between input and output are observed.
openaire   +1 more source

GUI Agents with Foundation Models: A Comprehensive Survey

arXiv.org
Recent advances in foundation models, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), have facilitated the development of intelligent agents capable of performing complex tasks. By leveraging the ability of (M)LLMs
Shuai Wang   +9 more
semanticscholar   +1 more source

Guys

2019
Bob Dylan performs "New Orleans Rag," "Blood in my Eyes," "That's Alright Mama," "Sitting on a Barbed Wire Fence," and "If You Gotta Go Go Now"
openaire   +2 more sources

SpiritSight Agent: Advanced GUI Agent with One Look

Computer Vision and Pattern Recognition
Graphical User Interface (GUI) agents demonstrate promising potential in assisting human-computer interaction, automating human user’s navigation on digital devices.
Zhiyuan Huang   +4 more
semanticscholar   +1 more source

The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use

arXiv.org
The recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment remains
Siyuan Hu   +3 more
semanticscholar   +1 more source

MobileFlow: A Multimodal LLM For Mobile GUI Agent

arXiv.org
Currently, the integration of mobile Graphical User Interfaces (GUIs) is ubiquitous in most people's daily lives. And the ongoing evolution of multimodal large-scale models, such as GPT-4v, Qwen-VL-Max, has significantly bolstered the capabilities of GUI
Songqin Nong   +6 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy