Results 61 to 70 of about 33,931 (272)
Selective Reviews of Bandit Problems in AI via a Statistical View
Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment.
Pengjie Zhou, Haoyu Wei, Huiming Zhang
doaj +1 more source
Regret bounds for Narendra-Shapiro bandit algorithms [PDF]
Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature ...
Gadat, Sébastien +2 more
core +2 more sources
The contested dynamics of slum gentrification in Rio de Janeiro came into focus during the brief period of relative peace brought by the pacification policy leading up to the 2016 Olympics. In this unprecedented moment, Rio's South Zone favela residents experienced a respite from the daily confrontations with police operations and drug trade violence ...
Angela Torresan
wiley +1 more source
Comparison of Video Recommendation Effects of Etc, Ucb, and Thompson Sampling Algorithms on Short-Video Platforms [PDF]
This paper comprehensively compares the performance of three multi-armed bandit (MAB) algorithms, Epsilon-Then-Commit (ETC), upper confidence bound (UCB), and Thompson sampling (TS), for video recommendation in dynamic environments.
Li Shouchuan
doaj +1 more source
Dynamic Pricing With Recommendation and Consumer Feedback
ABSTRACT A long‐lived seller sells a new product of unknown value by offering prices and recommendations to short‐lived consumers in continuous time. The seller receives consumer feedback about the product at a rate that increases with the instantaneous sales volume.
Wenji Xu, Shuoguang Yang
wiley +1 more source
Carbon‐Aware Scheduling in Cloud Computing Operations: A Multi‐Objective Optimisation Approach
Dynamic carbon‐aware scheduler using forecasts, optimisation and rolling‐horizon control, minimises cost and emissions by shifting workloads by time and region to match renewables and combines LSTM + boosting forecasts with MILP solved via a constraint‐based multi‐objective method.
Kassem Danach +3 more
wiley +1 more source
In order to enhance the anti-jamming capability of aeronautic swarm tactical network in the complicated electromagnetic environment, we address the problem of bandit-based cognitive anti-jamming strategy for enabling reliable information transmission. We
Haitao Li, Jiawei Luo, Changjun Liu
doaj +1 more source
Multi-armed bandit problem with precedence relations
Consider a multi-phase project management problem where the decision maker needs to deal with two issues: (a) how to allocate resources to projects within each phase, and (b) when to enter the next phase, so that the total expected reward is as large as ...
Chan, Hock Peng +2 more
core +1 more source
The attack‐and‐defense conflict with the gun‐and‐butter dilemma
Abstract We analyze a general equilibrium model of attack and defense with production. One attacker and one defender allocate fixed endowments between producing butter and guns. We characterize the unique interior and unique corner equilibrium, and find that (i) the defenders may spend more resources on conflict than the attacker even without loss ...
Subhasish M. Chowdhury, Iryna Topolyan
wiley +1 more source
Energy efficiency is the major concern in hierarchical wireless sensor networks(WSNs), where the major energy consumption originates from radios for communication.
Jian Zhang, Jian Tang, Feng Wang
semanticscholar +1 more source

