Approximate policy iteration - Open Access .click

Results 11 to 20 of about 93,202 (286)

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies [PDF]

, 2013
We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes.
Lesner, Boris, Scherrer, Bruno
core +7 more sources

Projections for Approximate Policy Iteration Algorithms [PDF]

, 2019
Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in ...
Akrour, R. +3 more
openaire +4 more sources

Approximate Policy Iteration with Linear Action Models

Proceedings of the AAAI Conference on Artificial Intelligence, 2021
In this paper we consider the problem of finding a good policy given some batch data.We propose a new approach, LAM-API, that first builds a so-called linear action model (LAM) from the data and then uses the learned model and the collected data in approximate policy iteration (API) to find a good policy.A natural choice for the ...
Hengshuai Yao, Csaba Szepesvari
openaire +3 more sources

Approximate Policy Iteration for Semi-Markov Control Revisited

Procedia Computer Science, 2011
AbstractThe semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon.
Abhijit Gosavi
openaire +2 more sources

Optimization Issues in KL-Constrained Approximate Policy Iteration [PDF]

, 2021
Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API). While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy.
Lazić, Nevena +4 more
+5 more sources

Privacy-preserving ADP for secure tracking control of AVRs against unreliable communication [PDF]

Frontiers in Neurorobotics
In this study, we developed an encrypted guaranteed-cost tracking control scheme for autonomous vehicles or robots (AVRs), by using the adaptive dynamic programming technique.
Kun Zhang, Kezhen Han, Zhijian Hu, Guoqiang Tan +3 more
doaj +2 more sources

On approximate policy iteration for continuous-time systems [PDF]

Proceedings of the 44th IEEE Conference on Decision and Control, 2006
We propose a new algorithm for feedback nonlinear synthesis. The algorithm computes suboptimal solutions, with bounds on suboptimality, to the Hamilton-Jacobi-Bellman equation. For systems that are modeled with polynomials the computations can be done efficiently via semidefinite programming.
Wernrud, Andreas, Rantzer, Anders
openaire +4 more sources

Solving Common-Payoff Games with Approximate Policy Iteration

, 2020
For artificially intelligent learning systems to be deployed widely in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is challenging. Even finding approximately optimal joint policies of decentralized partially observable Markov decision processes (Dec-POMDPs), a standard formalism ...
Samuel Sokota
openaire +2 more sources

Approximate Modified Policy Iteration

, 2012
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite.
Scherrer, Bruno +3 more
openaire +5 more sources

Formally Verified Approximate Policy Iteration

Proceedings of the AAAI Conference on Artificial Intelligence
We present a methodology based on interactive theorem proving that facilitates the development of verified implementations of algorithms for solving factored Markov Decision Processes. As a case study, we formally verify an algorithm for approximate policy iteration in the proof assistant Isabelle/HOL.
Schäffeler, Maximilian, Abdulaziz, Mohammad +1 more
openaire +4 more sources

computer science
mathematics
mathematical optimization

reinforcement learning
artificial intelligence
statistics

machine learning
markov decision process
applied mathematics