Approximate policy iteration - Open Access .click

Results 21 to 30 of about 93,202 (286)

Approximate Policy Iteration for Markov Control Revisited

Procedia Computer Science, 2012
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy ...
Abhijit Gosavi
openaire +2 more sources

Softened approximate policy iteration for Markov games

, 2016
This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of New-ton's method to different norms of the OBR.
Pérolat, Julien +4 more
openaire +3 more sources

Approximate policy iteration using neural networks for storage problems [PDF]

, 2019
26 pages, 5 ...
Dokka, Trivikram, Frimpong, Richlove
openaire +3 more sources

Algorithms of approximate dynamic programming for hydro scheduling [PDF]

E3S Web of Conferences, 2020
In hydro scheduling, unit commitment is a complex sub-problem. This paper proposes a new approximate dynamic programming technique to solve unit commitment.
Parvez Iram, Shen Jianjian
doaj +1 more source

Error propagation for approximate policy and value iteration

, 2010
We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration.
Farahmand, Amir Massoud, Munos, Rémi, Szepesvari, Csaba +2 more
+6 more sources

A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems

IEEE Access, 2021
Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the
Junping Hu +5 more
doaj +1 more source

A reinforcement learning approach to the stochastic cutting stock problem

EURO Journal on Computational Optimization, 2022
We propose a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process. At each decision epoch, given current inventory of items, an agent chooses in which patterns to cut objects in stock in ...
Anselmo R. Pitombeira-Neto, Arthur H.F. Murta +1 more
doaj +1 more source

Adaptive Critic Design-Based Robust Cooperative Tracking Control for Nonlinear Multi-Agent Systems With Disturbances

IEEE Access, 2021
In this paper, a novel robust cooperative tracking control algorithm is proposed for nonlinear multi-agent graphical games with disturbances based on adaptive dynamic programming approach.
Qiuxia Qu, Liangliang Sun, Zhigang Li
doaj +1 more source

An approximate dynamic programming method for unit-based small hydropower scheduling

Frontiers in Energy Research, 2022
Hydropower will become an important power source of China’s power grids oriented to carbon neutral. In order to fully exploit the potential of water resources and achieve low-carbon operation, this paper proposes an approximate dynamic programming (ADP ...
Yueyang Ji, Hua Wei
doaj +1 more source

Data-Driven Suboptimal Scheduling of Switched Systems

Sensors, 2020
In this paper, a data-driven optimal scheduling approach is investigated for continuous-time switched systems with unknown subsystems and infinite-horizon cost functions.
Chi Zhang +3 more
doaj +1 more source

computer science
mathematics
mathematical optimization

reinforcement learning
artificial intelligence
statistics

machine learning
markov decision process
applied mathematics