Results 251 to 260 of about 93,202 (286)
Some of the next articles are maybe not open access.
Deep approximate policy iteration
The Annals of StatisticszbMATH Open Web Interface contents unavailable due to conflicting licenses.
Jiao, Yuling +4 more
openaire +2 more sources
Empirical policy iteration for approximate dynamic programming
53rd IEEE Conference on Decision and Control, 2014We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we ...
William B. Haskell +2 more
openaire +1 more source
Approximate finite-horizon optimal control with policy iteration
Proceedings of the 33rd Chinese Control Conference, 2014In this paper, the policy iteration algorithm for the finite-horizon optimal control of continuous time systems is addressed. The finite-horizon optimal control with input constraints is formulated in the Hamilton-Jacobi-Bellman (HJB) equation by using a suitable nonquadratic function.
Zhengen Zhao, Ying Yang, Hao Li, Dan Liu
openaire +1 more source
Approximate Policy Iteration with Bellman Residuals Minimization
2014Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many real-world applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control
Esposito Gennaro, Martin Mario
openaire +1 more source
Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition
IEEE Transactions on Neural Networks, 2011In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), e.g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large ...
Xin, Xu +3 more
openaire +2 more sources
Filter based Explorized Policy Iteration Algorithm for On-Policy Approximate LQR
2019 IEEE Symposium Series on Computational Intelligence (SSCI), 2019A filter-based policy iteration (PI) algorithm has been proposed to design an adaptive optimal controller (AOC) for uncertain continuous linear time invariant (LTI) systems. A novel two-layered filtering architecture is introduced in the PI algorithm- the first layer filters tactically eliminate the need for state derivative knowledge and finite window
Sumit Kumar Jha +2 more
openaire +1 more source
Robotic Knee Parameter Tuning Using Approximate Policy Iteration
2019This paper presents an online model-free reinforcement learning based controller realized by approximate dynamic programming for a robotic knee as part of a human-machine system. Traditionally, prosthesis wearers’ gait performance is improved by manually tuning the impedance parameters.
Xiang Gao +4 more
openaire +1 more source
Simulation-Based Approximate Policy Iteration with Generalized Logistic Functions
INFORMS Journal on Computing, 2015We present an approximate dynamic programming method based on simulation, policy iteration, a postdecision state formulation, and a logistic value function approximation. This method was developed as part of our efforts to determine whether nonlinear value function approximations could provide cost-effective policies for advance patient scheduling ...
Antoine Sauré +2 more
openaire +1 more source
Reordering Sparsification of Kernel Machines in Approximate Policy Iteration
2009Approximate policy iteration (API), which includes least-squares policy iteration (LSPI) and its kernelized version (KLSPI), has received increasing attention due to their good convergence and generalization abilities in solving difficult reinforcement learning problems.
Chunming Liu +3 more
openaire +1 more source
Approximate Policy Iteration With Deep Minimax Average Bellman Error Minimization
IEEE Transactions on Neural Networks and Learning SystemsIn this work, we investigate the utilization of deep approximate policy iteration (DAPI) in estimating the optimal action-value function within the context of reinforcement learning, employing rectified linear unit (ReLU) ResNet as the underlying framework.
Lican Kang +5 more
openaire +2 more sources

