Results 21 to 30 of about 56,185 (161)
Strategy Selection and Outcome Evaluation of Three-Way Decisions Based on Reinforcement Learning [PDF]
The trisecting-acting-outcome (TAO) model of three-way decision (3WD) consists of three steps: trisect a whole, design action strategies, and outcome analysis and measurement.
LIU Xiaoxue, JIANG Chunmao
doaj +1 more source
Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems [PDF]
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T.
Liu, Keqin, Vakili, Sattar, Zhao, Qing
core +2 more sources
Dopamine, reward learning, and active inference
Temporal difference learning models propose phasic dopamine signalling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward.
Thomas eFitzgerald +3 more
doaj +1 more source
A Bayesian Account of Generalist and Specialist Formation Under the Active Inference Framework
This paper offers a formal account of policy learning, or habitual behavioral optimization, under the framework of Active Inference. In this setting, habit formation becomes an autodidactic, experience-dependent process, based upon what the agent sees ...
Anthony G. Chen +4 more
doaj +1 more source
A Markov chain Monte Carlo algorithm for Bayesian policy search
Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima ...
Vahid Tavakol Aghaei +2 more
doaj +1 more source
A Reinforcement Learning-Based Congestion Control Approach for V2V Communication in VANET
Vehicular ad hoc networks (VANETs) are crucial components of intelligent transportation systems (ITS) aimed at enhancing road safety and providing additional services to vehicles and their users.
Xiaofeng Liu +2 more
doaj +1 more source
A Markov Reward Process-Based Approach to Spatial Interpolation
The interpolation of spatial data can be of tremendous value in various applications, such as forecasting weather from only a few measurements of meteorological or remote sensing data. Existing methods for spatial interpolation, such as variants of kriging and spatial autoregressive models, tend to suffer from at least one of the following limitations:
openaire +2 more sources
Approximate receding horizon approach for Markov decision processes: average reward case
The authors consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which they call ''approximate receding horizon control''.
Chang, Hyeong Soo, Marcus, Steven I.
openaire +2 more sources
Background Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified ...
Chao Yu, Jiming Liu, Hongyi Zhao
doaj +1 more source
Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. [PDF]
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species.
Greg Jensen +4 more
doaj +1 more source

