Markov reward approach - Open Access .click

Results 41 to 50 of about 56,185 (161)

, 2018
Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's ...
A Solar-Lezama +10 more
core +1 more source

Model Checking Optimal Infinite-Horizon Control for Probabilistic Gene Regulatory Networks

IEEE Access, 2018
Genetic regulatory networks (GRNs) are significant fundamental biological networks through which biological system functions can be regulated. A significant challenge in the field of system biology is the construction of a control theory of GRNs through ...
Lisong Wang +4 more
doaj +1 more source

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

, 2002
The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle $t$ action $y_t$ results in perception $x_t$ and reward $r_t$, where all quantities in general may depend on the complete history.
Hutter, Marcus
core +4 more sources

Feature Reinforcement Learning: Part I: Unstructured MDPs [PDF]

, 2009
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov
Hutter, Marcus
core +3 more sources

Modified Index Policies for Multi-Armed Bandits with Network-like Markovian Dependencies

Network
Sequential decision-making in dynamic and interconnected environments is a cornerstone of numerous applications, ranging from communication networks and finance to distributed blockchain systems and IoT frameworks. The multi-armed bandit (MAB) problem is
Abdalaziz Sawwan, Jie Wu
doaj +1 more source

A Linear Programming Approach to Error Bounds for Random Walks in the Quarter-plane [PDF]

, 2014
We consider the approximation of the performance of random walks in the quarter-plane. The approximation is in terms of a random walk with a product-form stationary distribution, which is obtained by perturbing the transition probabilities along the ...
Boucherie, Richard J. +2 more
core

Dynamic Service Composition Method Based on Zero-Sum Game Integrated Inverse Reinforcement Learning

IEEE Access, 2023
Automatically generating service composition solutions that meet user application requirements is one of the hot research topics in the field of service composition in the context of Web service big data. To address the challenges of accurately obtaining
Yuan Yuan, Yuhan Guo, Wanqing Ma
doaj +1 more source

Dynamic Multi-Arm Bandit Game Based Multi-Agents Spectrum Sharing Strategy Design

, 2017
For a wireless avionics communication system, a Multi-arm bandit game is mathematically formulated, which includes channel states, strategies, and rewards.
Blasch, Erik +6 more
core +1 more source

Bird’s Eye View feature selection for high-dimensional data

Scientific Reports, 2023
In machine learning, an informative dataset is crucial for accurate predictions. However, high dimensional data often contains irrelevant features, outliers, and noise, which can negatively impact model performance and consume computational resources. To
Samir Brahim Belhaouari +4 more
doaj +1 more source

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

, 2017
Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known ...
Buchholz, Peter +3 more
core +1 more source

reinforcement learning
markov decision process
mathematics

decision-making
inverse reinforcement learning
biology general

availability
policy improvement
probability math.pr