Results 331 to 333 of about 6,813,126 (333)
Some of the next articles are maybe not open access.

Double Q-learning

2010
In some stochastic environments the well-known reinforcement learning algorithm Q-learning performs very poorly. This poor performance is caused by large overestimations of action values. These overestimations result from a positive bias that is introduced because Q-learning uses the maximum action value as an approximation for the maximum expected ...
openaire   +1 more source

Neural Q-Learning

2000
ten Hagen, S.H.G., Kröse, B.J.A.
openaire   +1 more source

Home - About - Disclaimer - Privacy