Approximate policy iteration - Open Access .click

Results 1 to 10 of about 93,202 (286)

Approximate Policy Iteration Schemes: A Comparison [PDF]

, 2014
We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration, Conservative Policy Iteration (CPI ...
Scherrer, Bruno
core +11 more sources

Approximate policy iteration: A survey and some new methods [PDF]

Journal of Control Theory and Applications, 2010
We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality.
A. G. Barto +82 more
core +8 more sources

Rollout sampling approximate policy iteration [PDF]

Machine Learning, 2008
Several researchers have recently investigated the connection between reinforcement learning and classification. We are motivated by proposals of approximate policy iteration schemes without value functions which focus on policy representation using classifiers and address policy learning as a supervised learning problem.
Dimitrakakis Christos() +2 more
openaire +9 more sources

Adaptive Approximate Policy Iteration [PDF]

, 2020
Model-free reinforcement learning algorithms combined with value function approximation have recently achieved impressive performance in a variety of application domains. However, the theoretical understanding of such algorithms is limited, and existing results are largely focused on episodic or discounted Markov decision processes (MDPs). In this work,
Hao, Botao +4 more
+6 more sources

Approximate policy iteration using regularised Bellman residuals minimisation [PDF]

Journal of Experimental & Theoretical Artificial Intelligence, 2015
Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many real-world applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control
Esposito, Gennaro, Martín Muñoz, Mario
openaire +4 more sources

Sarsa(Λ)-Based Logistics Planning Approximated by Value Function with Policy Iteration [PDF]

Journal of Algorithms & Computational Technology, 2015
The logistics planning problem has been extensively investigated for a long time. However, with the increasing number of stochastic events occurred in road, increasing number of stochastic factors should be taken into consideration. A dynamic approach is
Yu Tang
doaj +2 more sources

Algorithms and Bounds for Rollout Sampling Approximate Policy Iteration [PDF]

, 2008
14 pages, presented at EWRL ...
Dimitrakakis, C., Lagoudakis, M.G.
+9 more sources

Approximate Midpoint Policy Iteration for Linear Quadratic Control [PDF]

, 2020
We present a midpoint policy iteration algorithm to solve linear quadratic optimal control problems in both model-based and model-free settings. The algorithm is a variation of Newton's method, and we show that in the model-based setting it achieves cubic convergence, which is superior to standard policy iteration and policy gradient algorithms that ...
Gravell, Benjamin, Shames, Iman, Summers, Tyler +2 more
openaire +3 more sources

Solving Common-Payoff Games with Approximate Policy Iteration [PDF]

Proceedings of the AAAI Conference on Artificial Intelligence, 2021
For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult---computing even an epsilon-optimal joint policy is a NEXP complete problem.
Sokota, Samuel +8 more
openaire +3 more sources

An approximate policy iteration viewpoint of actor–critic algorithms [PDF]

Automatica, 2022
In this work, we consider policy-based methods for solving the reinforcement learning problem, and establish the sample complexity guarantees. A policy-based algorithm typically consists of an actor and a critic. We consider using various policy update rules for the actor, including the celebrated natural policy gradient.
Chen, Zaiwei, Maguluri, Siva Theja
openaire +4 more sources

computer science
mathematics
mathematical optimization

reinforcement learning
artificial intelligence
statistics

machine learning
markov decision process
applied mathematics