Restless bandits - Open Access .click

Results 111 to 120 of about 147 (138)

Some of the next articles are maybe not open access.

An online algorithm for the risk-aware restless bandit

European Journal of Operational Research, 2021
zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Jianyu Xu, Lujie Chen, Ou Tang
openaire +1 more source

Group Maintenance: A Restless Bandits Approach

INFORMS Journal on Computing, 2019
We consider a maintenance planner problem to dynamically allocate the available repairmen to a system of unreliable production facilities. Each facility has several machines that incur a linear production loss due to stochastic degradation, which we model as a continuous time Markov process with fully observable states.
Abderrahmane Abbou, Viliam Makis
openaire +2 more sources

Restless Hidden Markov Bandit with Linear Rewards

2020 59th IEEE Conference on Decision and Control (CDC), 2020
This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state.
Michal Yemini, Amir Leshem, Anelia Somekh-Baruch +2 more
openaire +2 more sources

Optimal target tracking with restless bandits

Digital Signal Processing, 2006
Abstract This paper examines the problem of adaptive beam scheduling to minimise target tracking error with a phased array radar. It is shown that this can be posed in a framework that is similar to a particular type of dynamic programming problem known as the restless bandit problem. We will show that when the problem is put in this framework it has
Barbara F. La Scala, William Moran 0001
openaire +1 more source

On an index policy for restless bandits

Journal of Applied Probability, 1990
We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort.
Weber, Richard R., Weiss, Gideon
openaire +1 more source

Wireless Channel Selection with Restless Bandits

2017
Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time
Kuhn, Julia, Nazarathy, Yoni
openaire +4 more sources

Index policies for a class of discounted restless bandits

Advances in Applied Probability, 2002
The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for ...
Glazebrook, K. D., Niño-Mora, J., Ansell, P. S. +2 more
openaire +1 more source

Restless bandits that hide their hand and recommendation systems

2017 9th International Conference on Communication Systems and Networks (COMSNETS), 2017
We consider a restless multi-armed bandit (RMAB) in which each arm can be in one of two states, say 0 or 1. Playing the arm brings it to state 0 with probability one and not playing it induces state transitions with arm-dependent probabilities. Playing an arm generates a unit reward with a probability that depends on the state of the arm.
Rahul Meshram, Aditya Gopalan, D. Manjunath +2 more
openaire +1 more source

Towards Q-learning the Whittle Index for Restless Bandits

2019 Australian & New Zealand Control Conference (ANZCC), 2019
We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost objective. Each arm of the RMABP is associated with a Markov process that operates in two modes: active and passive. At each time slot a controller needs to designate a subset of the arms to be active, of which the associated processes will evolve ...
Jing Fu 0001 +3 more
openaire +3 more sources

Adaptive learning of uncontrolled restless bandits with logarithmic regret

2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2011
In this paper we consider the problem of learning the optimal policy for the uncontrolled restless bandit problem. In this problem only the state of the selected arm can be observed, the state transitions are independent of control and the transition law is unknown. We propose a learning algorithm which gives logarithmic regret uniformly over time with
Cem Tekin, Mingyan Liu
openaire +2 more sources

fos: computer and information sciences
machine learning cs.lg
computer science - machine learning

fos: mathematics
markov and semi-markov decision processes
whittle index

index policies
optimization and control math.oc
indexability