Markov reward approach - Open Access .click

Results 21 to 30 of about 56,185 (161)

Strategy Selection and Outcome Evaluation of Three-Way Decisions Based on Reinforcement Learning [PDF]

Jisuanji kexue yu tansuo
The trisecting-acting-outcome (TAO) model of three-way decision (3WD) consists of three steps: trisect a whole, design action strategies, and outcome analysis and measurement.
LIU Xiaoxue, JIANG Chunmao
doaj +1 more source

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems [PDF]

, 2013
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T.
Liu, Keqin, Vakili, Sattar, Zhao, Qing
core +2 more sources

Dopamine, reward learning, and active inference

Frontiers in Computational Neuroscience, 2015
Temporal difference learning models propose phasic dopamine signalling encodes reward prediction errors that drive learning. This is supported by studies where optogenetic stimulation of dopamine neurons can stand in lieu of actual reward.
Thomas eFitzgerald, Ray eDolan, Ray eDolan, Karl eFriston +3 more
doaj +1 more source

A Bayesian Account of Generalist and Specialist Formation Under the Active Inference Framework

Frontiers in Artificial Intelligence, 2020
This paper offers a formal account of policy learning, or habitual behavioral optimization, under the framework of Active Inference. In this setting, habit formation becomes an autodidactic, experience-dependent process, based upon what the agent sees ...
Anthony G. Chen +4 more
doaj +1 more source

A Markov chain Monte Carlo algorithm for Bayesian policy search

Systems Science & Control Engineering, 2018
Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima ...
Vahid Tavakol Aghaei, Ahmet Onat, Sinan Yıldırım +2 more
doaj +1 more source

A Reinforcement Learning-Based Congestion Control Approach for V2V Communication in VANET

Applied Sciences, 2023
Vehicular ad hoc networks (VANETs) are crucial components of intelligent transportation systems (ITS) aimed at enhancing road safety and providing additional services to vehicles and their users.
Xiaofeng Liu, Ben St. Amour, Arunita Jaekel +2 more
doaj +1 more source

A Markov Reward Process-Based Approach to Spatial Interpolation

, 2021
The interpolation of spatial data can be of tremendous value in various applications, such as forecasting weather from only a few measurements of meteorological or remote sensing data. Existing methods for spatial interpolation, such as variants of kriging and spatial autoregressive models, tend to suffer from at least one of the following limitations:
openaire +2 more sources

Approximate receding horizon approach for Markov decision processes: average reward case

Journal of Mathematical Analysis and Applications, 2003
The authors consider an approximation scheme for solving Markov decision processes (MDPs) with countable state space, finite action space, and bounded rewards that uses an approximate solution of a fixed finite-horizon sub-MDP of a given infinite-horizon MDP to create a stationary policy, which they call ''approximate receding horizon control''.
Chang, Hyeong Soo, Marcus, Steven I.
openaire +2 more sources

Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units

BMC Medical Informatics and Decision Making, 2019
Background Reinforcement learning (RL) provides a promising technique to solve complex sequential decision making problems in health care domains. To ensure such applications, an explicit reward function encoding domain knowledge should be specified ...
Chao Yu, Jiming Liu, Hongyi Zhao
doaj +1 more source

Implicit Value Updating Explains Transitive Inference Performance: The Betasort Model. [PDF]

PLoS Computational Biology, 2015
Transitive inference (the ability to infer that B > D given that B > C and C > D) is a widespread characteristic of serial learning, observed in dozens of species.
Greg Jensen +4 more
doaj +1 more source

reinforcement learning
markov decision process
mathematics

decision-making
inverse reinforcement learning
biology general

availability
policy improvement
probability math.pr