Results 261 to 270 of about 8,751,520 (309)
Some of the next articles are maybe not open access.
IEEE International Conference on Robotics and Automation, 2022
Recent Multi-Agent Deep Reinforcement Learning approaches factorize a global action-value to address non-stationarity and favor cooperation. These methods, however, hinder exploration by introducing constraints (e.g., additive value-decomposition) to ...
Enrico Marchesini, A. Farinelli
semanticscholar +1 more source
Recent Multi-Agent Deep Reinforcement Learning approaches factorize a global action-value to address non-stationarity and favor cooperation. These methods, however, hinder exploration by introducing constraints (e.g., additive value-decomposition) to ...
Enrico Marchesini, A. Farinelli
semanticscholar +1 more source
Neuro-Evolutionary Direct Policy Search for Multiobjective Optimal Control
IEEE Transactions on Neural Networks and Learning Systems, 2021Direct policy search (DPS) is emerging as one of the most effective and widely applied reinforcement learning (RL) methods to design optimal control policies for multiobjective Markov decision processes (MOMDPs).
M. Zaniolo, M. Giuliani, A. Castelletti
semanticscholar +1 more source
Water Resources Research, 2020
Policy search methods provide a heuristic mapping between observations and decisions and have been widely used in reservoir control studies. However, recent studies have observed a tendency for policy search methods to overfit to the hydrologic data used
Z. Brodeur, J. Herman, S. Steinschneider
semanticscholar +1 more source
Policy search methods provide a heuristic mapping between observations and decisions and have been widely used in reservoir control studies. However, recent studies have observed a tendency for policy search methods to overfit to the hydrologic data used
Z. Brodeur, J. Herman, S. Steinschneider
semanticscholar +1 more source
2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 2011
In this paper we address the combination of batch reinforcement-learning (BRL) techniques with direct policy search (DPS) algorithms in the context of robot learning. Batch value-based algorithms (such as fitted Q-iteration) have been proved to outperform online ones in many complex applications, but they share the same difficulties in solving problems
MIGLIAVACCA, MARTINO +4 more
openaire +1 more source
In this paper we address the combination of batch reinforcement-learning (BRL) techniques with direct policy search (DPS) algorithms in the context of robot learning. Batch value-based algorithms (such as fitted Q-iteration) have been proved to outperform online ones in many complex applications, but they share the same difficulties in solving problems
MIGLIAVACCA, MARTINO +4 more
openaire +1 more source
2003
We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths ...
J. Andrew Bagnell, Schneider, Jeff
openaire +1 more source
We investigate the problem of non-covariant behavior of policy gradient reinforcement learning algorithms. The policy gradient approach is amenable to analysis by information geometric methods. This leads us to propose a natural metric on controller parameterization that results from considering the manifold of probability distributions over paths ...
J. Andrew Bagnell, Schneider, Jeff
openaire +1 more source
Nonconvex Policy Search Using Variational Inequalities
Neural Computation, 2017Policy search is a class of reinforcement learning algorithms for finding optimal policies in control problems with limited feedback. These methods have been shown to be successful in high-dimensional problems such as robotics control. Though successful, current methods can lead to unsafe policy parameters that potentially could damage hardware units.
Zhan, Yusen +2 more
openaire +3 more sources
Policy Search by Dynamic Programming
2003We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees.
J. Andrew Bagnell +3 more
openaire +1 more source
Numerical Quadrature for Probabilistic Policy Search
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020Learning control policies has become an appealing alternative to the derivation of control laws based on classic control theory. Model-based approaches have proven an outstanding data efficiency, especially when combined with probabilistic models to eliminate model bias. However, a major difficulty for these methods is that multi-step-ahead predictions
Julia Vinogradska +4 more
openaire +3 more sources
Relative Entropy Policy Search
Proceedings of the AAAI Conference on Artificial Intelligence, 2010Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients, many of these problems may be addressed by constraining the
Peters, J., Mülling, K., Altun, Y.
openaire +2 more sources
Distributed Fusion-Based Policy Search for Fast Robot Locomotion Learning
IEEE Computational Intelligence Magazine, 2019Deep reinforcement learning methods are developed to deal with challenging locomotion control problems in a robotics domain and can achieve significant performance improvement over conventional control methods.
Zhengcai Cao, Qing Xiao, Mengchu Zhou
semanticscholar +1 more source

