Results 71 to 80 of about 313 (110)
A (2/3)n³ Fast-Pivoting Algorithm for the Gittins Index and Optimal Stopping of a Markov Chain
This paper presents a new fast-pivoting algorithm that computes the n Gittins index values of an n-state bandit—in the discounted and undiscounted cases—by performing 2/3 n3 + O n2 arithmetic operations, thus attaining better complexity than previous ...
José Niño-mora
core
Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue [PDF]
We address the problem of scheduling a multiclass M/M/1 queue with a finite dedicated buffer for each class. Some classes are delay-sensitive, modeling real-time traffic (e.g.
Niño-Mora, José +2 more
core
On the Behavior of Proposers in Ultimatum Games [PDF]
We demonstrate that one should not expect convergence of the proposals to the subgame perfect Nash equilibrium offer in standard ultimatum games. First, imposing strict experimental control of the behavior of the receiving players and focusing on the ...
Nicolaas J. Vriend, Thomas Brenner
core +2 more sources
Q-Learning for Bandit Problems
Multi-armed bandits may be viewed as decompositionally-structured Markov decision processes (MDP's) with potentially verylarge state sets. A particularly elegant methodology for computing optimal policies was developed over twenty ago by Gittins ...
Michael O. Duff
core +1 more source
The Learning Component of Dynamic Allocation Indices
*This article is free to read on the publisher's website*\ud \ud For a multiarmed bandit problem with exponential discounting the optimal allocation rule is defined by a dynamic allocation index defined for each arm on its space.
Gittins, J., Wang, Y-G., Wang, Y. G.
core +1 more source
This paper examines a class of singular stochastic control problems with convex objective functions. In Section 2, we use tools from convex analysis to derive necessary and sufficient first order conditions for this class of optimisation problems. The main result of this paper is Theorem 9 which uses results from optimal stopping to establish the link ...
openaire +2 more sources
MARGINAL PRODUCTIVITY INDEX POLICIES FOR SCHEDULING A MULTICLASS DELAY-/LOSS-SENSITIVE QUEUE [PDF]
We address the problem of scheduling a multiclass M/M/1 queue with a finite dedicated buffer for each class. Some classes are delay-sensitive, modeling real-time traffic (e.g.
Jose Niño-Mora
core
Stationary Multi Choice Bandit Problems [PDF]
This note shows that the optimal choice of k simultaneous experiments in a stationary multi-armed bandit problem can be characterized in terms of the Gittins index of each arm.
Dirk Bergemann, Juuso Vaimaki
core
The performance of forwards induction policies
Following major theoretical advances in the study of multi-armed bandit problems, Gittins proposed a forwards induction (FI) approach to the development of policies for Markov decision processes (MDP's).
Gittins, J. C., Glazebrook, K. D.
core
Stochastic scheduling: A short history of index policies and some key recent developments
A multi-armed bandit problem classically concerns N >= 2 populations of rewards whose statistical properties are unknown (or at least only partly known).
Minty, John +3 more
core

