Results 11 to 20 of about 5,156,964 (309)
–Q-learning is a regression-based approach that is widely used to formalize the development of an optimal dynamic treatment strategy. Finite dimensional working models are typically used to estimate certain nuisance parameters, and misspecification of ...
Robert L. Strawderman (2880557) +3 more
core +5 more sources
The Q-learning algorithm is known to be affected by the maximization bias, i.e. the systematic overestimation of action values, an important issue that has recently received renewed attention. Double Q-learning has been proposed as an efficient algorithm
Zhu, Rong, Rigotti, Mattia
core +3 more sources
Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning [PDF]
A deep Q network (DQN) (Mnih et al., 2013) is an extension of Q learning, which is a typical deep reinforcement learning method. In DQN, a Q function expresses all action values under all states, and it is approximated using a convolutional neural ...
Shota Ohnishi +6 more
doaj +3 more sources
International audienceWe introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm.
Kappen, Hilbert +3 more
core +5 more sources
The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited.
Lee, Donghwan, He, Niao
core +4 more sources
In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values, which result from a positive bias that is introduced because Q ...
Hasselt, H. P. (Hado) van
core +4 more sources
An Improved Q-Learning Algorithm and Its Application in Path Planning
Traditional Q-Learning algorithm has the problems of too many random searches and slow convergence speed. Therefore, in this paper an improved ε-Q-Learning algorithm based on traditional Q-Learning algorithm was propased and applied to path planning. The
Guojun MAO, Shimin GU
doaj +1 more source
Multi-Source Multi-Destination Hybrid Infrastructure-Aided Traffic Aware Routing in V2V/I Networks
The concept of the “connected car” offers the potential for safer, more enjoyable and more efficient driving and eventually autonomous driving.
Teodor Ivanescu +3 more
doaj +1 more source
Generalized Speedy Q-Learning [PDF]
In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role.
Indu John +2 more
openaire +2 more sources

