Lagrangian index policy for restless bandits with average reward
We study the Lagrangian Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions.
Konstantin Avrachenkov +2 more
openaire +2 more sources
Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder. [PDF]
Wiehler A, Chakroun K, Peters J.
europepmc +1 more source
Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales. [PDF]
Marković D, Goschke T, Kiebel SJ.
europepmc +1 more source
Model Predictive Control is Almost Optimal for Restless Bandit
Reviewed and accepted to COLT ...
Gast, Nicolas, Narasimha, Dheeraj
openaire +3 more sources
INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS. [PDF]
Villar SS.
europepmc +1 more source
Low-complexity algorithm for restless bandits with imperfect observations
We consider a class of restless bandit problems that finds a broad application area in reinforcement learning and stochastic optimization. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue.
Keqin Liu +2 more
openaire +2 more sources
"Watching my family being killed by terrorists made me really depressed": Mental health experiences, challenges and needed support of young internally displaced persons in northern Nigeria. [PDF]
Olufadewa II +3 more
europepmc +1 more source
Adaptive tuning of human learning and choice variability to unexpected uncertainty. [PDF]
Lee JK, Rouault M, Wyart V.
europepmc +1 more source
Caching Contents with Varying Popularity Using Restless Bandits
There were a mistakes while submitting updated version.
K. J. Pavamana, Chandramani Singh
openaire +3 more sources
Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making. [PDF]
Chakroun K +4 more
europepmc +1 more source

