Results 21 to 30 of about 93,202 (286)
Approximate Policy Iteration for Markov Control Revisited
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy ...
Abhijit Gosavi
openaire +2 more sources
Softened approximate policy iteration for Markov games
This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of New-ton's method to different norms of the OBR.
Pérolat, Julien +4 more
openaire +3 more sources
Approximate policy iteration using neural networks for storage problems [PDF]
26 pages, 5 ...
Dokka, Trivikram, Frimpong, Richlove
openaire +3 more sources
Algorithms of approximate dynamic programming for hydro scheduling [PDF]
In hydro scheduling, unit commitment is a complex sub-problem. This paper proposes a new approximate dynamic programming technique to solve unit commitment.
Parvez Iram, Shen Jianjian
doaj +1 more source
Error propagation for approximate policy and value iteration
We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration.
Farahmand, Amir Massoud +2 more
+6 more sources
A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems
Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the
Junping Hu +5 more
doaj +1 more source
A reinforcement learning approach to the stochastic cutting stock problem
We propose a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process. At each decision epoch, given current inventory of items, an agent chooses in which patterns to cut objects in stock in ...
Anselmo R. Pitombeira-Neto +1 more
doaj +1 more source
In this paper, a novel robust cooperative tracking control algorithm is proposed for nonlinear multi-agent graphical games with disturbances based on adaptive dynamic programming approach.
Qiuxia Qu, Liangliang Sun, Zhigang Li
doaj +1 more source
An approximate dynamic programming method for unit-based small hydropower scheduling
Hydropower will become an important power source of China’s power grids oriented to carbon neutral. In order to fully exploit the potential of water resources and achieve low-carbon operation, this paper proposes an approximate dynamic programming (ADP ...
Yueyang Ji, Hua Wei
doaj +1 more source
Data-Driven Suboptimal Scheduling of Switched Systems
In this paper, a data-driven optimal scheduling approach is investigated for continuous-time switched systems with unknown subsystems and infinite-horizon cost functions.
Chi Zhang +3 more
doaj +1 more source

