Results 301 to 310 of about 769,446 (361)
Some of the next articles are maybe not open access.

The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections

arXiv.org
A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions.
Chaoran Chen   +10 more
semanticscholar   +1 more source

ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay

arXiv.org
Training large language models (LLMs) as interactive agents for controlling graphical user interfaces (GUIs) presents a unique challenge to optimize long-horizon action sequences with multimodal feedback from complex environments. While recent works have
Fanbin Lu   +4 more
semanticscholar   +1 more source

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

arXiv.org
Recent Graphical User Interface (GUI) agents replicate the R1-Zero paradigm, coupling online Reinforcement Learning (RL) with explicit chain-of-thought reasoning prior to object grounding and thereby achieving substantial performance gains. In this paper,
Yuqi Zhou   +5 more
semanticscholar   +1 more source

GUI-G2: Gaussian Reward Modeling for GUI Grounding

arXiv.org
Graphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating ...
Fei Tang   +11 more
semanticscholar   +1 more source

AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
The recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability.
Zhong Zhang   +24 more
semanticscholar   +1 more source

AppAgentX: Evolving GUI Agents as Proficient Smartphone Users

arXiv.org
Recent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs).
Wenjia Jiang   +4 more
semanticscholar   +1 more source

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

Conference on Empirical Methods in Natural Language Processing
Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API.
Jiwen Zhang   +7 more
semanticscholar   +1 more source

Good guys and bad guys

ECTJ, 1986
ECTJ, VOL. 34, NO. 1, PAGES 3-7 ISSN 0148-5806 I recently served as review editor on an article submitted to ECTJ by Richard Clark titled "Evidence for Confounding in Computer-Based Instruction Studies: Analyzing the Meta Analyses," (Clark, 1985). While I fussed over some aspects of the first draft, I thought Clark did a good job of highlighting some ...
openaire   +1 more source

ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Computer Vision and Pattern Recognition
Building Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. While most agents are language-based, relying on closed-source API with text-rich meta-information (e.g., HTML or accessibility tree),
Kevin Qinghong Lin   +8 more
semanticscholar   +1 more source

A Survey on (M)LLM-Based GUI Agents

arXiv.org
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction, evolving from rule-based automation scripts to sophisticated AI-driven systems capable of understanding and executing complex interface ...
Fei Tang   +14 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy