Results 131 to 140 of about 937,585 (275)

Finite Model Approximations for Partially Observed Markov Decision Processes with Discounted Cost [PDF]

open access: yesarXiv, 2017
We consider finite model approximations of discrete-time partially observed Markov decision processes (POMDPs) under the discounted cost criterion. After converting the original partially observed stochastic control problem to a fully observed one on the belief space, the finite models are obtained through the uniform quantization of the state and ...
arxiv  

On the adaptive control of a class of partially observed Markov decision processes

open access: yesJournal of Mathematical Analysis and Applications, 2011
AbstractThis paper is concerned with the adaptive control problem, over the infinite horizon, for partially observable Markov decision processes whose transition functions are parameterized by an unknown vector. We treat finite models and impose relatively mild assumptions on the transition function.
Hsu, Shun-Pin, Arapostathis, Ari
openaire   +3 more sources

Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor–Critic Framework for Complex Environment

open access: yesMathematics
The development of artificial intelligence (AI) game agents that use deep reinforcement learning (DRL) algorithms to process visual information for decision-making has emerged as a key research focus in both academia and industry.
Zihao Cui   +4 more
doaj   +1 more source

Opportunistic Spectrum Access in Self-Similar Primary Traffic

open access: yesEURASIP Journal on Advances in Signal Processing, 2009
We take a stochastic optimization approach to opportunity tracking and access in self-similar primary traffic. Based on a multiple time-scale hierarchical Markovian model, we formulate opportunity tracking and access in self-similar primary traffic as a ...
Xiangyang Xiao, Qing Zhao, Keqin Liu
doaj   +1 more source

Pulse‐Level Quantum Robust Control with Diffusion‐Based Reinforcement Learning

open access: yesAdvanced Physics Research, EarlyView.
This paper proposes a diffusion‐based reinforcement learning method for pulse‐level quantum robust control (PQC‐DBRL) to enhance the robustness of pulse‐level quantum gate control. When evaluated across different Hamiltonian variations, PQC‐DBRL shows a smaller fidelity variance compared to GRAPE, indicating higher robustness against system parameter ...
Yuanjing Zhang   +3 more
wiley   +1 more source

Linking uplift, erosion, and sedimentation using landscape evolution models: Madagascar since the Late Cretaceous

open access: yesEarth Surface Processes and Landforms, Volume 48, Issue 1, Page 215-229, January 2023., 2023
This paper uses a numerical landscape evolution model to reconstruct the topographic history of Madagascar since the Late Cretaceous. The model is optimised by balancing the volumes of onshore erosion and offshore sedimentation; the former is predicted with erosion laws and based on uplift history inferred from elevated planation surfaces.
Ruohong Jiao   +4 more
wiley   +1 more source

Bayesian and frequentist statistical models to predict publishing output and article processing charge totals

open access: yesJournal of the Association for Information Science and Technology, EarlyView.
Abstract Academic libraries, institutions, and publishers are interested in predicting future publishing output to help evaluate publishing agreements. Current predictive models are overly simplistic and provide inaccurate predictions. This paper presents Bayesian and frequentist statistical models to predict future article counts and costs.
Philip M. Dixon, Eric Schares
wiley   +1 more source

Cooperative multi-target hunting by unmanned surface vehicles based on multi-agent reinforcement learning

open access: yesDefence Technology, 2023
To solve the problem of multi-target hunting by an unmanned surface vehicle (USV) fleet, a hunting algorithm based on multi-agent reinforcement learning is proposed.
Jiawei Xia   +5 more
doaj  

Partially Observable Markov Decision Process for Recommender Systems

open access: yes, 2016
We report the "Recurrent Deterioration" (RD) phenomenon observed in online recommender systems. The RD phenomenon is reflected by the trend of performance degradation when the recommendation model is always trained based on users' feedbacks of the previous recommendations.
Lu, Zhongqi, Yang, Qiang
openaire   +2 more sources

Home - About - Disclaimer - Privacy