Results 121 to 130 of about 199,848 (147)
Energy efficient group priority MAC protocol using hybrid Q-learning honey Badger Algorithm (QL-HBA) for IoT Networks. [PDF]
Venkatachalam I +4 more
europepmc +1 more source
Enhancing the Efficiency of a Cybersecurity Operations Center Using Biomimetic Algorithms Empowered by Deep Q-Learning. [PDF]
Olivares R +4 more
europepmc +1 more source
A Temporal Deep Q Learning for Optimal Load Balancing in Software-Defined Networks. [PDF]
Sharma A +2 more
europepmc +1 more source
Advancing ASD identification with neuroimaging: a novel GARL methodology integrating Deep Q-Learning and generative adversarial networks. [PDF]
Zhou Y +5 more
europepmc +1 more source
Optimizing Human-Robot Teaming Performance through Q-Learning-Based Task Load Adjustment and Physiological Data Analysis. [PDF]
Korivand S +4 more
europepmc +1 more source
Optimizing QoS and security in agriculture IoT deployments: A bioinspired Q-learning model with customized shards. [PDF]
Sonavane SM +6 more
europepmc +1 more source
Some of the next articles are maybe not open access.
Related searches:
Related searches:
IEEE Transactions on Neural Networks, 2000
This paper develops the theory of quad-Q-learning which is a new learning algorithm that evolved from Q-learning. Quad-Q-learning is applicable to problems that can be solved by "divide and conquer" techniques. Quad-Q-learning concerns an autonomous agent that learns without supervision to act optimally to achieve specified goals.
C, Clausen, H, Wechsler
openaire +2 more sources
This paper develops the theory of quad-Q-learning which is a new learning algorithm that evolved from Q-learning. Quad-Q-learning is applicable to problems that can be solved by "divide and conquer" techniques. Quad-Q-learning concerns an autonomous agent that learns without supervision to act optimally to achieve specified goals.
C, Clausen, H, Wechsler
openaire +2 more sources
2021 American Control Conference (ACC), 2021
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the “projected Bellman equation” have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent?
Fan Lu +3 more
openaire +1 more source
It is well known that the extension of Watkins' algorithm to general function approximation settings is challenging: does the “projected Bellman equation” have a solution? If so, is the solution useful in the sense of generating a good policy? And, if the preceding questions are answered in the affirmative, is the algorithm consistent?
Fan Lu +3 more
openaire +1 more source
Neural Computing & Applications, 2003
In this paper we introduce a novel neural reinforcement learning method. Unlike existing methods, our approach does not need a model of the system and can be trained directly using the measurements of the system. We achieve this by only using one function approximator and approximate the improved policy from this.
Stephan ten Hagen, Ben Kr�se
openaire +3 more sources
In this paper we introduce a novel neural reinforcement learning method. Unlike existing methods, our approach does not need a model of the system and can be trained directly using the measurements of the system. We achieve this by only using one function approximator and approximate the improved policy from this.
Stephan ten Hagen, Ben Kr�se
openaire +3 more sources
2021
In this paper we focus on reinforcement learning algorithms that are sensitive to risk. The notion of risk we work with is the well-known conditional value-at-risk (CVaR). We describe a faster method for computing value iteration updates for CVaR markov decision processes (MDP). This improvement then opens doors for a sampling version of the algorithm,
Silvestr Stanko, Karel Macek
openaire +1 more source
In this paper we focus on reinforcement learning algorithms that are sensitive to risk. The notion of risk we work with is the well-known conditional value-at-risk (CVaR). We describe a faster method for computing value iteration updates for CVaR markov decision processes (MDP). This improvement then opens doors for a sampling version of the algorithm,
Silvestr Stanko, Karel Macek
openaire +1 more source

