Results 11 to 20 of about 93,202 (286)
Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies [PDF]
We consider approximate dynamic programming for the infinite-horizon stationary $\gamma$-discounted optimal control problem formalized by Markov Decision Processes.
Lesner, Boris, Scherrer, Bruno
core +7 more sources
Projections for Approximate Policy Iteration Algorithms [PDF]
Approximate policy iteration is a class of reinforcement learning (RL) algorithms where the policy is encoded using a function approximator and which has been especially prominent in RL with continuous action spaces. In this class of RL algorithms, ensuring increase of the policy return during policy update often requires to constrain the change in ...
Akrour, R. +3 more
openaire +4 more sources
Approximate Policy Iteration with Linear Action Models
In this paper we consider the problem of finding a good policy given some batch data.We propose a new approach, LAM-API, that first builds a so-called linear action model (LAM) from the data and then uses the learned model and the collected data in approximate policy iteration (API) to find a good policy.A natural choice for the ...
Hengshuai Yao, Csaba Szepesvari
openaire +3 more sources
Approximate Policy Iteration for Semi-Markov Control Revisited
AbstractThe semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon.
Abhijit Gosavi
openaire +2 more sources
Optimization Issues in KL-Constrained Approximate Policy Iteration [PDF]
Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API). While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy.
Lazić, Nevena +4 more
+5 more sources
Privacy-preserving ADP for secure tracking control of AVRs against unreliable communication [PDF]
In this study, we developed an encrypted guaranteed-cost tracking control scheme for autonomous vehicles or robots (AVRs), by using the adaptive dynamic programming technique.
Kun Zhang +3 more
doaj +2 more sources
On approximate policy iteration for continuous-time systems [PDF]
We propose a new algorithm for feedback nonlinear synthesis. The algorithm computes suboptimal solutions, with bounds on suboptimality, to the Hamilton-Jacobi-Bellman equation. For systems that are modeled with polynomials the computations can be done efficiently via semidefinite programming.
Wernrud, Andreas, Rantzer, Anders
openaire +4 more sources
Solving Common-Payoff Games with Approximate Policy Iteration
For artificially intelligent learning systems to be deployed widely in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is challenging. Even finding approximately optimal joint policies of decentralized partially observable Markov decision processes (Dec-POMDPs), a standard formalism ...
Samuel Sokota
openaire +2 more sources
Approximate Modified Policy Iteration
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite.
Scherrer, Bruno +3 more
openaire +5 more sources
Formally Verified Approximate Policy Iteration
We present a methodology based on interactive theorem proving that facilitates the development of verified implementations of algorithms for solving factored Markov Decision Processes. As a case study, we formally verify an algorithm for approximate policy iteration in the proof assistant Isabelle/HOL.
Schäffeler, Maximilian +1 more
openaire +4 more sources

