Safety-Aware Apprenticeship Learning
Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's ...
A Solar-Lezama +10 more
core +1 more source
Model Checking Optimal Infinite-Horizon Control for Probabilistic Gene Regulatory Networks
Genetic regulatory networks (GRNs) are significant fundamental biological networks through which biological system functions can be regulated. A significant challenge in the field of system biology is the construction of a control theory of GRNs through ...
Lisong Wang +4 more
doaj +1 more source
Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures
The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle $t$ action $y_t$ results in perception $x_t$ and reward $r_t$, where all quantities in general may depend on the complete history.
Hutter, Marcus
core +4 more sources
Feature Reinforcement Learning: Part I: Unstructured MDPs [PDF]
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov
Hutter, Marcus
core +3 more sources
Modified Index Policies for Multi-Armed Bandits with Network-like Markovian Dependencies
Sequential decision-making in dynamic and interconnected environments is a cornerstone of numerous applications, ranging from communication networks and finance to distributed blockchain systems and IoT frameworks. The multi-armed bandit (MAB) problem is
Abdalaziz Sawwan, Jie Wu
doaj +1 more source
A Linear Programming Approach to Error Bounds for Random Walks in the Quarter-plane [PDF]
We consider the approximation of the performance of random walks in the quarter-plane. The approximation is in terms of a random walk with a product-form stationary distribution, which is obtained by perturbing the transition probabilities along the ...
Boucherie, Richard J. +2 more
core
Dynamic Service Composition Method Based on Zero-Sum Game Integrated Inverse Reinforcement Learning
Automatically generating service composition solutions that meet user application requirements is one of the hot research topics in the field of service composition in the context of Web service big data. To address the challenges of accurately obtaining
Yuan Yuan, Yuhan Guo, Wanqing Ma
doaj +1 more source
Dynamic Multi-Arm Bandit Game Based Multi-Agents Spectrum Sharing Strategy Design
For a wireless avionics communication system, a Multi-arm bandit game is mathematically formulated, which includes channel states, strategies, and rewards.
Blasch, Erik +6 more
core +1 more source
Bird’s Eye View feature selection for high-dimensional data
In machine learning, an informative dataset is crucial for accurate predictions. However, high dimensional data often contains irrelevant features, outliers, and noise, which can negatively impact model performance and consume computational resources. To
Samir Brahim Belhaouari +4 more
doaj +1 more source
Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters
Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not known ...
Buchholz, Peter +3 more
core +1 more source

