Restless bandits - Open Access .click

Results 61 to 70 of about 147 (138)

Lagrangian index policy for restless bandits with average reward

Queueing Systems
We study the Lagrangian Index Policy (LIP) for restless multi-armed bandits with long-run average reward. In particular, we compare the performance of LIP with the performance of the Whittle Index Policy (WIP), both heuristic policies known to be asymptotically optimal under certain natural conditions.
Konstantin Avrachenkov, Vivek S. Borkar, Pratik Shah +2 more
openaire +2 more sources

Attenuated Directed Exploration during Reinforcement Learning in Gambling Disorder. [PDF]

J Neurosci, 2021
Wiehler A, Chakroun K, Peters J.
europepmc +1 more source

Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales. [PDF]

Cogn Affect Behav Neurosci, 2021
Marković D, Goschke T, Kiebel SJ.
europepmc +1 more source

Model Predictive Control is Almost Optimal for Restless Bandit

CoRR
Reviewed and accepted to COLT ...
Gast, Nicolas, Narasimha, Dheeraj
openaire +3 more sources

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS. [PDF]

Probab Eng Inf Sci, 2016
Villar SS.
europepmc +1 more source

Low-complexity algorithm for restless bandits with imperfect observations

Mathematical Methods of Operations Research
We consider a class of restless bandit problems that finds a broad application area in reinforcement learning and stochastic optimization. We consider $N$ independent discrete-time Markov processes, each of which had two possible states: 1 and 0 (`good' and `bad'). Only if a process is both in state 1 and observed to be so does reward accrue.
Keqin Liu, Richard Weber 0003, Chengzhong Zhang +2 more
openaire +2 more sources

"Watching my family being killed by terrorists made me really depressed": Mental health experiences, challenges and needed support of young internally displaced persons in northern Nigeria. [PDF]

J Migr Health, 2022
Olufadewa II, Adesina MA, Oladele RI, Ayorinde TA. +3 more
europepmc +1 more source

Adaptive tuning of human learning and choice variability to unexpected uncertainty. [PDF]

Sci Adv, 2023
Lee JK, Rouault M, Wyart V.
europepmc +1 more source

Caching Contents with Varying Popularity Using Restless Bandits

There were a mistakes while submitting updated version.
K. J. Pavamana, Chandramani Singh
openaire +3 more sources

Dopaminergic modulation of the exploration/exploitation trade-off in human decision-making. [PDF]

Elife, 2020
Chakroun K +4 more
europepmc +1 more source

fos: computer and information sciences
machine learning cs.lg
computer science - machine learning

fos: mathematics
markov and semi-markov decision processes
whittle index

index policies
optimization and control math.oc
indexability