Results 251 to 260 of about 48,213 (289)
Some of the next articles are maybe not open access.
2007
In Chapter 2, we introduced the basic principles of PA and derived the performance derivative formulas for queueing networks and Markov and semi-Markov systems with these principles. In Chapter 3, we developed sample-path-based (on-line learning) algorithms for estimating the performance derivatives and sample-path-based optimization schemes.
+4 more sources
In Chapter 2, we introduced the basic principles of PA and derived the performance derivative formulas for queueing networks and Markov and semi-Markov systems with these principles. In Chapter 3, we developed sample-path-based (on-line learning) algorithms for estimating the performance derivatives and sample-path-based optimization schemes.
+4 more sources
2021
As discussed in Chapter 1, reinforcement learning involves sequential decision-making. In this chapter, we will formalize the notion of using stochastic processes under the branch of probability that models sequential decision-making behavior. While most of the problems we study in reinforcement learning are modeled as Markov decision processes (MDP ...
+4 more sources
As discussed in Chapter 1, reinforcement learning involves sequential decision-making. In this chapter, we will formalize the notion of using stochastic processes under the branch of probability that models sequential decision-making behavior. While most of the problems we study in reinforcement learning are modeled as Markov decision processes (MDP ...
+4 more sources
2009
Markov chains provide a useful modeling tool for determining expected profits or costs associated with certain types of systems. The key characteristic that allows for a Markov model is a probability law in which the future behavior of the system is independent of the past behavior given the present condition of the system. When this Markov property is
Richard M. Feldman +1 more
openaire +1 more source
Markov chains provide a useful modeling tool for determining expected profits or costs associated with certain types of systems. The key characteristic that allows for a Markov model is a probability law in which the future behavior of the system is independent of the past behavior given the present condition of the system. When this Markov property is
Richard M. Feldman +1 more
openaire +1 more source
2013
We provide a formal description of the discounted reward MDP framework in Chap. 1, including both the finite- and the infinite-horizon settings and summarizing the associated optimality equations. We then present the well-known exact solution algorithms, value iteration and policy iteration, and outline a framework of rolling-horizon control (also ...
Hyeong Soo Chang +3 more
openaire +1 more source
We provide a formal description of the discounted reward MDP framework in Chap. 1, including both the finite- and the infinite-horizon settings and summarizing the associated optimality equations. We then present the well-known exact solution algorithms, value iteration and policy iteration, and outline a framework of rolling-horizon control (also ...
Hyeong Soo Chang +3 more
openaire +1 more source
European Journal of Operational Research, 1989
The paper is an introduction to Markov decision processes mainly addressed to possible applicants. Therefore it presents a finite model only, but a broad variety of objectives, algorithms (e.g. aggregation), and extensions (e.g. semi-Markov, partially observed, adaptive multiobjective, and constrained models).
White, Chelsea C. III, White, Douglas J.
openaire +1 more source
The paper is an introduction to Markov decision processes mainly addressed to possible applicants. Therefore it presents a finite model only, but a broad variety of objectives, algorithms (e.g. aggregation), and extensions (e.g. semi-Markov, partially observed, adaptive multiobjective, and constrained models).
White, Chelsea C. III, White, Douglas J.
openaire +1 more source
Variance-Penalized Markov Decision Processes
Mathematics of Operations Research, 1989We consider a Markov decision process with both the expected limiting average, and the discounted total return criteria, appropriately modified to include a penalty for the variability in the stream of rewards. In both cases we formulate appropriate nonlinear programs in the space of state-action frequencies (averaged, or discounted) whose optimal ...
Filar, Jerzy A. +2 more
openaire +2 more sources
2015
This chapter introduces sequential decision problems, in particular Markov decision processes (MDPs). A formal definition of an MDP is given, and the two most common solution techniques are described: value iteration and policy iteration. Then, factored MDPs are described, which provide a representation based on graphical models to solve very large ...
openaire +1 more source
This chapter introduces sequential decision problems, in particular Markov decision processes (MDPs). A formal definition of an MDP is given, and the two most common solution techniques are described: value iteration and policy iteration. Then, factored MDPs are described, which provide a representation based on graphical models to solve very large ...
openaire +1 more source
In situ learning using intrinsic memristor variability via Markov chain Monte Carlo sampling
Nature Electronics, 2021Thomas Dalgaty +2 more
exaly

