Results 1 to 10 of about 34,082,428 (320)
General Value Function Networks [PDF]
State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation.
Schlegel, Matthew +5 more
openaire +5 more sources
Improving Deep Policy Gradients with Value Function Search [PDF]
Deep Policy Gradient (PG) algorithms employ value networks to drive the learning of parameterized policies and reduce the variance of the gradient estimates. However, value function approximation gets stuck in local optima and struggles to fit the actual
Enrico Marchesini, Chris Amato
semanticscholar +1 more source
Value-Function-Based Sequential Minimization for Bi-Level Optimization [PDF]
Gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks. However, most existing strategies are theoretically designed based on restrictive assumptions (e.g., convexity of the lower-level sub-problem ...
Risheng Liu +4 more
semanticscholar +1 more source
Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation [PDF]
In recent years, there are great interests as well as many challenges in applying reinforcement learning (RL) to recommendation systems (RS). In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive ...
Kai Wang +7 more
semanticscholar +1 more source
Statistical inference of the value function for reinforcement learning in infinite‐horizon settings [PDF]
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision‐making problems.
C. Shi, Shengyao Zhang, W. Lu, R. Song
semanticscholar +1 more source
The detection and tracking of small and weak maneuvering radar targets in complex electromagnetic environments is still a difficult problem to effectively solve. To address this problem, this paper proposes a dynamic programming tracking-before-detection
Fei Song +3 more
doaj +1 more source
We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation. Such an
Hongyao Tang +11 more
semanticscholar +1 more source
Team Control Problem in Virtual Ellipsoid and Its Numerical Simulations
There is tremendous interest in designing feedback strategy control for clusters in modern control theory. We propose a novel numerical solution to target team control problems by using the Hamilton formalism methods.
Zhiqing Dang +6 more
doaj +1 more source
This paper solves the flow-shop scheduling problem (FSP) through the reinforcement learning (RL), which approximates the value function with neural network (NN).
Jianfeng Ren, C. Ye, Feng Yang
semanticscholar +1 more source
𝑄-valued functions revisited [PDF]
In this note we revisit Almgren's theory of Q-valued functions, that are functions taking values in the space of unordered Q-tuples of points in R^n. In particular: 1) we give shorter versions of Almgren's proofs of the existence of Dir-minimizing Q-valued functions, of their Hoelder regularity and of the dimension estimate of their singular set; 2) we
Camillo De Lellis, Emanuele Spadaro
openaire +2 more sources

