Results 191 to 200 of about 2,392,451 (238)

Excess mortality in Mainland China after the end of the Zero COVID policy: A systematic review. [PDF]

open access: yesEpidemiol Infect
Fung IC   +10 more
europepmc   +1 more source

Fitted policy search

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2011
In this paper we address the combination of batch reinforcement-learning (BRL) techniques with direct policy search (DPS) algorithms in the context of robot learning. Batch value-based algorithms (such as fitted Q-iteration) have been proved to outperform online ones in many complex applications, but they share the same difficulties in solving problems
MIGLIAVACCA, MARTINO   +4 more
openaire   +1 more source

Covariant Policy Search

2003
We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths ...
J. Andrew Bagnell, Schneider, Jeff
openaire   +1 more source

Nonconvex Policy Search Using Variational Inequalities

Neural Computation, 2017
Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units.
Zhan, Yusen   +2 more
openaire   +3 more sources

Policy Search by Dynamic Programming

2003
We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees.
J. Andrew Bagnell   +3 more
openaire   +1 more source

Home - About - Disclaimer - Privacy