Results 321 to 330 of about 769,446 (361)
Some of the next articles are maybe not open access.
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Annual Meeting of the Association for Computational LinguisticsGraphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality ...
Qiushi Sun +14 more
semanticscholar +1 more source
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents
Annual Meeting of the Association for Computational LinguisticsAI agents have drawn increasing attention mostly on their ability to perceive environments, understand tasks, and autonomously achieve goals. To advance research on AI agents in mobile scenarios, we introduce the Android Multi-annotation EXpo (AMEX), a ...
Yuxiang Chai +8 more
semanticscholar +1 more source
DiMo-GUI: Advancing Test-time Scaling in GUI Grounding via Modality-Aware Visual Reasoning
Conference on Empirical Methods in Natural Language ProcessingGrounding natural language queries in graphical user interfaces (GUIs) poses unique challenges due to the diversity of visual elements, spatial clutter, and the ambiguity of language.
Hang Wu +6 more
semanticscholar +1 more source
2010
The operation of multiple systems is not stable. This issue is treated using a new system consisting of a compensator and a unit feedback. The closed loop system is with inner stability, and the system outputs have desirable behaviour. No undesirable reactions/interrelations between input and output are observed.
openaire +1 more source
The operation of multiple systems is not stable. This issue is treated using a new system consisting of a compensator and a unit feedback. The closed loop system is with inner stability, and the system outputs have desirable behaviour. No undesirable reactions/interrelations between input and output are observed.
openaire +1 more source
GUI Agents with Foundation Models: A Comprehensive Survey
arXiv.orgRecent advances in foundation models, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs), have facilitated the development of intelligent agents capable of performing complex tasks. By leveraging the ability of (M)LLMs
Shuai Wang +9 more
semanticscholar +1 more source
2019
Bob Dylan performs "New Orleans Rag," "Blood in my Eyes," "That's Alright Mama," "Sitting on a Barbed Wire Fence," and "If You Gotta Go Go Now"
openaire +2 more sources
Bob Dylan performs "New Orleans Rag," "Blood in my Eyes," "That's Alright Mama," "Sitting on a Barbed Wire Fence," and "If You Gotta Go Go Now"
openaire +2 more sources
SpiritSight Agent: Advanced GUI Agent with One Look
Computer Vision and Pattern RecognitionGraphical User Interface (GUI) agents demonstrate promising potential in assisting human-computer interaction, automating human user’s navigation on digital devices.
Zhiyuan Huang +4 more
semanticscholar +1 more source
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use
arXiv.orgThe recently released model, Claude 3.5 Computer Use, stands out as the first frontier AI model to offer computer use in public beta as a graphical user interface (GUI) agent. As an early beta, its capability in the real-world complex environment remains
Siyuan Hu +3 more
semanticscholar +1 more source
MobileFlow: A Multimodal LLM For Mobile GUI Agent
arXiv.orgCurrently, the integration of mobile Graphical User Interfaces (GUIs) is ubiquitous in most people's daily lives. And the ongoing evolution of multimodal large-scale models, such as GPT-4v, Qwen-VL-Max, has significantly bolstered the capabilities of GUI
Songqin Nong +6 more
semanticscholar +1 more source
ORTEP-3 for Windows - a version of ORTEP-III with a Graphical User Interface (GUI)
, 1997L. Farrugia
semanticscholar +1 more source

