Results 111 to 120 of about 147 (138)
Some of the next articles are maybe not open access.
An online algorithm for the risk-aware restless bandit
European Journal of Operational Research, 2021zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Jianyu Xu, Lujie Chen, Ou Tang
openaire +1 more source
Group Maintenance: A Restless Bandits Approach
INFORMS Journal on Computing, 2019We consider a maintenance planner problem to dynamically allocate the available repairmen to a system of unreliable production facilities. Each facility has several machines that incur a linear production loss due to stochastic degradation, which we model as a continuous time Markov process with fully observable states.
Abderrahmane Abbou, Viliam Makis
openaire +2 more sources
Restless Hidden Markov Bandit with Linear Rewards
2020 59th IEEE Conference on Decision and Control (CDC), 2020This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state.
Michal Yemini +2 more
openaire +2 more sources
Optimal target tracking with restless bandits
Digital Signal Processing, 2006Abstract This paper examines the problem of adaptive beam scheduling to minimise target tracking error with a phased array radar. It is shown that this can be posed in a framework that is similar to a particular type of dynamic programming problem known as the restless bandit problem. We will show that when the problem is put in this framework it has
Barbara F. La Scala, William Moran 0001
openaire +1 more source
On an index policy for restless bandits
Journal of Applied Probability, 1990We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort.
Weber, Richard R., Weiss, Gideon
openaire +1 more source
Wireless Channel Selection with Restless Bandits
2017Wireless devices are often able to communicate on several alternative channels; for example, cellular phones may use several frequency bands and are equipped with base-station communication capability together with WiFi and Bluetooth communication. Automatic decision support systems in such devices need to decide which channels to use at any given time
Kuhn, Julia, Nazarathy, Yoni
openaire +4 more sources
Index policies for a class of discounted restless bandits
Advances in Applied Probability, 2002The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for ...
Glazebrook, K. D. +2 more
openaire +1 more source
Restless bandits that hide their hand and recommendation systems
2017 9th International Conference on Communication Systems and Networks (COMSNETS), 2017We consider a restless multi-armed bandit (RMAB) in which each arm can be in one of two states, say 0 or 1. Playing the arm brings it to state 0 with probability one and not playing it induces state transitions with arm-dependent probabilities. Playing an arm generates a unit reward with a probability that depends on the state of the arm.
Rahul Meshram +2 more
openaire +1 more source
Towards Q-learning the Whittle Index for Restless Bandits
2019 Australian & New Zealand Control Conference (ANZCC), 2019We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost objective. Each arm of the RMABP is associated with a Markov process that operates in two modes: active and passive. At each time slot a controller needs to designate a subset of the arms to be active, of which the associated processes will evolve ...
Jing Fu 0001 +3 more
openaire +3 more sources
Adaptive learning of uncontrolled restless bandits with logarithmic regret
2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2011In this paper we consider the problem of learning the optimal policy for the uncontrolled restless bandit problem. In this problem only the state of the selected arm can be observed, the state transitions are independent of control and the transition law is unknown. We propose a learning algorithm which gives logarithmic regret uniformly over time with
Cem Tekin, Mingyan Liu
openaire +2 more sources

