Results 211 to 220 of about 68,246 (249)
Some of the next articles are maybe not open access.
Constrained Semi-Markov decision processes with average rewards
ZOR Zeitschrift f�r Operations Research Mathematical Methods of Opeartions Research, 1994This paper deals with constrained average reward semi-Markov decision processes with finite state and action sets. Two average reward criteria are considered, namely time average and ratio average. The author proved the existence of optimal mixed stationary policies and showed, under the unichain condition, the existence of randomized stationary ...
openaire +2 more sources
Semi-Markov decision processes with polynomial reward
Journal of Applied Probability, 1982A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached
openaire +1 more source
Discrete-time equivalence for constrained semi-Markov decision processes
1985 24th IEEE Conference on Decision and Control, 1985A continuous-time average reward Markov decision process problem is most easily solved in terms of an equivalent discrete-time Markov decision process (DMDP); customary hypotheses include that the process is a Markov jump process with denumerable state space and bounded transition rates, that actions are chosen at the jump points of the process, and ...
Frederick Beutler, Keith Ross
openaire +1 more source
Constrained Discounted Semi-Markov Decision Processes
2002This paper reduces problems on the existence and the finding of optimal policies for multiple criterion discounted SMDPs to similar problems for MDPs. We prove this reduction and illustrate it by extending to SMDPs several results for constrained discounted MDPs.
openaire +1 more source
Uniformization for semi-Markov decision processes under stationary policies
Journal of Applied Probability, 1987Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in ...
Beutler, Frederick J., Ross, Keith W.
openaire +2 more sources
Reinforcement learning for semi-Markov decision processes with applications
2023This thesis focuses on semi-Markov decision processes and their connection with Reinforcement Learning via Q-learning technique. We start by discussing some general ideas around Machine Learning, Reinforcement Learning and Hierarchical Reinforcement Learning.
openaire +1 more source
Performance Optimization of Semi-Markov Decision Processes with Discounted-cost Criteria
European Journal of Control, 2008zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Yin, Baoqun +3 more
openaire +1 more source
Finite horizon semi-Markov decision processes with multiple constraints
Proceeding of the 11th World Congress on Intelligent Control and Automation, 2014This paper focuses on solving a finite horizon semi-Markov decision process with multiple constraints. We convert the problem to a constrained absorbing discrete-time Markov decision process and then to an equivalent linear programm over a class of occupancy measures.
openaire +1 more source
Learning Automaton for Finite Semi-Markov Decision Processes
1983A finite semi-Markov decision process is studied to maximize the expected average reward. The semi-Markov kernel of the process depends on an unknown parameter taking values in a subset [a, b] of ℝS. A controller modelled as a learning automaton updates sequentially the probabilities of generating decisions based on the observed decisions, states, and ...
openaire +1 more source
Semi-Markov decision processes with a reachable state-subset
Optimization, 1989We consider the problem of minimizing the long-run average expected cost per unit time in a semi-Markov decision process with arbitrary state and action space, Assuming the existence .of a Borel subset of state space called a reachable state-subset, we derive the optimality equation for the unbounded costs.
openaire +1 more source

