Results 21 to 30 of about 93,202 (286)

Approximate Policy Iteration for Markov Control Revisited

open access: diamondProcedia Computer Science, 2012
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy ...
Abhijit Gosavi
openaire   +2 more sources

Softened approximate policy iteration for Markov games

open access: green, 2016
This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of New-ton's method to different norms of the OBR.
Pérolat, Julien   +4 more
openaire   +3 more sources

Algorithms of approximate dynamic programming for hydro scheduling [PDF]

open access: yesE3S Web of Conferences, 2020
In hydro scheduling, unit commitment is a complex sub-problem. This paper proposes a new approximate dynamic programming technique to solve unit commitment.
Parvez Iram, Shen Jianjian
doaj   +1 more source

Error propagation for approximate policy and value iteration

open access: green, 2010
We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration.
Farahmand, Amir Massoud   +2 more
  +6 more sources

A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems

open access: yesIEEE Access, 2021
Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the
Junping Hu   +5 more
doaj   +1 more source

A reinforcement learning approach to the stochastic cutting stock problem

open access: yesEURO Journal on Computational Optimization, 2022
We propose a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process. At each decision epoch, given current inventory of items, an agent chooses in which patterns to cut objects in stock in ...
Anselmo R. Pitombeira-Neto   +1 more
doaj   +1 more source

Adaptive Critic Design-Based Robust Cooperative Tracking Control for Nonlinear Multi-Agent Systems With Disturbances

open access: yesIEEE Access, 2021
In this paper, a novel robust cooperative tracking control algorithm is proposed for nonlinear multi-agent graphical games with disturbances based on adaptive dynamic programming approach.
Qiuxia Qu, Liangliang Sun, Zhigang Li
doaj   +1 more source

An approximate dynamic programming method for unit-based small hydropower scheduling

open access: yesFrontiers in Energy Research, 2022
Hydropower will become an important power source of China’s power grids oriented to carbon neutral. In order to fully exploit the potential of water resources and achieve low-carbon operation, this paper proposes an approximate dynamic programming (ADP ...
Yueyang Ji, Hua Wei
doaj   +1 more source

Data-Driven Suboptimal Scheduling of Switched Systems

open access: yesSensors, 2020
In this paper, a data-driven optimal scheduling approach is investigated for continuous-time switched systems with unknown subsystems and infinite-horizon cost functions.
Chi Zhang   +3 more
doaj   +1 more source

Home - About - Disclaimer - Privacy