The game theory was created on the basis of social as well as gambling games, such as chess, poker, baccarat, hex, or one-armed bandit. The aforementioned games lay solid foundations for analogous mathematical models (e.g., hex), artificial intelligence ...
Drabik Ewa
doaj +1 more source
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms [PDF]
Tiancheng Jin, Junyan Liu, Haipeng Luo
openalex +1 more source
Spectrum Allocation and User Scheduling Based on Combinatorial Multi-Armed Bandit for 5G Massive MIMO. [PDF]
Dou J, Liu X, Qie S, Li J, Wang C.
europepmc +1 more source
Stationary Multi Choice Bandit Problems [PDF]
This note shows that the optimal choice of k simultaneous experiments in a stationary multi-armed bandit problem can be characterized in terms of the Gittins index of each arm.
Dirk Bergemann, Juuso Vaimaki
core
A Comparative Study of UCB and Thompson Sampling with Structured Rewards: Parameter Sensitivity and Robustness [PDF]
The behavior of multi-armed bandit (MAB) algorithms is closely tied to how their hyperparameters are set, but their stability in structured reward environments has not been examined in depth.
Chen Yutong
doaj +1 more source
GNU Radio Implementation of MALIN: “Multi-Armed bandits Learning for Internet-of-things Networks” [PDF]
Lilian Besson +2 more
openalex +1 more source
muMAB: A Multi-Armed Bandit Model for Wireless Network Selection
Multi-armed bandit (MAB) models are a viable approach to describe the problem of best wireless network selection by a multi-Radio Access Technology (multi-RAT) device, with the goal of maximizing the quality perceived by the final user. The classical MAB
Stefano Boldrini +5 more
doaj +1 more source
On Kernelized Multi-armed Bandits
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization-Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically,
Chowdhury, Sayak Ray, Gopalan, Aditya
openaire +2 more sources
Adversarial Autoencoder and Multi-Armed Bandit for Dynamic Difficulty Adjustment in Immersive Virtual Reality for Rehabilitation: Application to Hand Movement. [PDF]
Kamikokuryo K +3 more
europepmc +1 more source

