Results 301 to 310 of about 769,446 (361)
Some of the next articles are maybe not open access.
The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections
arXiv.orgA Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions.
Chaoran Chen +10 more
semanticscholar +1 more source
ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay
arXiv.orgTraining large language models (LLMs) as interactive agents for controlling graphical user interfaces (GUIs) presents a unique challenge to optimize long-horizon action sequences with multimodal feedback from complex environments. While recent works have
Fanbin Lu +4 more
semanticscholar +1 more source
GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents
arXiv.orgRecent Graphical User Interface (GUI) agents replicate the R1-Zero paradigm, coupling online Reinforcement Learning (RL) with explicit chain-of-thought reasoning prior to object grounding and thereby achieving substantial performance gains. In this paper,
Yuqi Zhou +5 more
semanticscholar +1 more source
GUI-G2: Gaussian Reward Modeling for GUI Grounding
arXiv.orgGraphical User Interface (GUI) grounding maps natural language instructions to precise interface locations for autonomous interaction. Current reinforcement learning approaches use binary rewards that treat elements as hit-or-miss targets, creating ...
Fei Tang +11 more
semanticscholar +1 more source
AgentCPM-GUI: Building Mobile-Use Agents with Reinforcement Fine-Tuning
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System DemonstrationsThe recent progress of large language model agents has opened new possibilities for automating tasks through graphical user interfaces (GUIs), especially in mobile environments where intelligent interaction can greatly enhance usability.
Zhong Zhang +24 more
semanticscholar +1 more source
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users
arXiv.orgRecent advancements in Large Language Models (LLMs) have led to the development of intelligent LLM-based agents capable of interacting with graphical user interfaces (GUIs).
Wenjia Jiang +4 more
semanticscholar +1 more source
Android in the Zoo: Chain-of-Action-Thought for GUI Agents
Conference on Empirical Methods in Natural Language ProcessingLarge language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API.
Jiwen Zhang +7 more
semanticscholar +1 more source
ECTJ, 1986
ECTJ, VOL. 34, NO. 1, PAGES 3-7 ISSN 0148-5806 I recently served as review editor on an article submitted to ECTJ by Richard Clark titled "Evidence for Confounding in Computer-Based Instruction Studies: Analyzing the Meta Analyses," (Clark, 1985). While I fussed over some aspects of the first draft, I thought Clark did a good job of highlighting some ...
openaire +1 more source
ECTJ, VOL. 34, NO. 1, PAGES 3-7 ISSN 0148-5806 I recently served as review editor on an article submitted to ECTJ by Richard Clark titled "Evidence for Confounding in Computer-Based Instruction Studies: Analyzing the Meta Analyses," (Clark, 1985). While I fussed over some aspects of the first draft, I thought Clark did a good job of highlighting some ...
openaire +1 more source
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Computer Vision and Pattern RecognitionBuilding Graphical User Interface (GUI) assistants holds significant promise for enhancing human workflow productivity. While most agents are language-based, relying on closed-source API with text-rich meta-information (e.g., HTML or accessibility tree),
Kevin Qinghong Lin +8 more
semanticscholar +1 more source
A Survey on (M)LLM-Based GUI Agents
arXiv.orgGraphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction, evolving from rule-based automation scripts to sophisticated AI-driven systems capable of understanding and executing complex interface ...
Fei Tang +14 more
semanticscholar +1 more source

