Results 251 to 260 of about 93,202 (286)
Some of the next articles are maybe not open access.

Deep approximate policy iteration

The Annals of Statistics
zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Jiao, Yuling   +4 more
openaire   +2 more sources

Empirical policy iteration for approximate dynamic programming

53rd IEEE Conference on Decision and Control, 2014
We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we ...
William B. Haskell   +2 more
openaire   +1 more source

Approximate finite-horizon optimal control with policy iteration

Proceedings of the 33rd Chinese Control Conference, 2014
In this paper, the policy iteration algorithm for the finite-horizon optimal control of continuous time systems is addressed. The finite-horizon optimal control with input constraints is formulated in the Hamilton-Jacobi-Bellman (HJB) equation by using a suitable nonquadratic function.
Zhengen Zhao, Ying Yang, Hao Li, Dan Liu
openaire   +1 more source

Approximate Policy Iteration with Bellman Residuals Minimization

2014
Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision problems, which are very challenging in many real-world applications. RL problem is modeled as a Markov Decision Process (MDP) deeply studied in the literature. We consider Policy Iteration (PI) algorithms for RL which iteratively evaluate and improve control
Esposito Gennaro, Martin Mario
openaire   +1 more source

Hierarchical Approximate Policy Iteration With Binary-Tree State Space Decomposition

IEEE Transactions on Neural Networks, 2011
In recent years, approximate policy iteration (API) has attracted increasing attention in reinforcement learning (RL), e.g., least-squares policy iteration (LSPI) and its kernelized version, the kernel-based LSPI algorithm. However, it remains difficult for API algorithms to obtain near-optimal policies for Markov decision processes (MDPs) with large ...
Xin, Xu   +3 more
openaire   +2 more sources

Filter based Explorized Policy Iteration Algorithm for On-Policy Approximate LQR

2019 IEEE Symposium Series on Computational Intelligence (SSCI), 2019
A filter-based policy iteration (PI) algorithm has been proposed to design an adaptive optimal controller (AOC) for uncertain continuous linear time invariant (LTI) systems. A novel two-layered filtering architecture is introduced in the PI algorithm- the first layer filters tactically eliminate the need for state derivative knowledge and finite window
Sumit Kumar Jha   +2 more
openaire   +1 more source

Robotic Knee Parameter Tuning Using Approximate Policy Iteration

2019
This paper presents an online model-free reinforcement learning based controller realized by approximate dynamic programming for a robotic knee as part of a human-machine system. Traditionally, prosthesis wearers’ gait performance is improved by manually tuning the impedance parameters.
Xiang Gao   +4 more
openaire   +1 more source

Simulation-Based Approximate Policy Iteration with Generalized Logistic Functions

INFORMS Journal on Computing, 2015
We present an approximate dynamic programming method based on simulation, policy iteration, a postdecision state formulation, and a logistic value function approximation. This method was developed as part of our efforts to determine whether nonlinear value function approximations could provide cost-effective policies for advance patient scheduling ...
Antoine SaurĂ©   +2 more
openaire   +1 more source

Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

2009
Approximate policy iteration (API), which includes least-squares policy iteration (LSPI) and its kernelized version (KLSPI), has received increasing attention due to their good convergence and generalization abilities in solving difficult reinforcement learning problems.
Chunming Liu   +3 more
openaire   +1 more source

Approximate Policy Iteration With Deep Minimax Average Bellman Error Minimization

IEEE Transactions on Neural Networks and Learning Systems
In this work, we investigate the utilization of deep approximate policy iteration (DAPI) in estimating the optimal action-value function within the context of reinforcement learning, employing rectified linear unit (ReLU) ResNet as the underlying framework.
Lican Kang   +5 more
openaire   +2 more sources

Home - About - Disclaimer - Privacy