Results 101 to 110 of about 241,763 (280)
MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms. Its core formula relies on the disagreement among $Q$-networks in the form of the min-batch MaxMin $Q$-networks distance which is added to ...
Soffair, Nitsan, Mannor, Shie
openaire +2 more sources
A Theoretical Analysis of Cooperative Behavior in Multi-Agent Q-learning [PDF]
A number of experimental studies have investigated whether cooperative behavior may emerge in multi-agent Q-learning. In some studies cooperative behavior did emerge, in others it did not. This report provides a theoretical analysis of this issue.
Kaymak, U., Waltman, L.R.
core +1 more source
Geometry‐driven thermal behavior in wire‐arc additive manufacturing (WAAM) influences microstructural evolution during nonequilibrium solidification of a chemically complex Fe–Cr–Nb–W–Mo–C nanocomposite system. By comparing different deposits configurations, distinct entropy–cooling rate correlations, segregation, and carbide evolution are revealed ...
Blanca Palacios +5 more
wiley +1 more source
We draw an analogy between static friction in classical mechanics and extrapolation error in off-policy RL, and use it to formulate a constraint that prevents the policy from drifting toward unsupported actions. In this study, we present Frictional Q-learning, a deep reinforcement learning algorithm for continuous control, which extends batch ...
Kim, Hyunwoo, Lee, Hyo Kyung
openaire +2 more sources
In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double Q-learning is a provably convergent alternative that mitigates some of the overestimation issues, though sometimes ...
openaire +2 more sources
Zap Q-Learning for Optimal Stopping Time Problems
The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of ...
Bušić, Ana +3 more
core
Multimodal Data‐Driven Microstructure Characterization
A self‐consistent autonomous workflow for EBSP‐based microstructure segmentation by integrating PCA, GMM clustering, and cNMF with information‐theoretic parameter selection, requiring no user input. An optimal ROI size related to characteristic grain size is identified.
Qi Zhang +4 more
wiley +1 more source
A novel workflow for investigating hydride vapor phase epitaxy for GaN bulk crystal growth is proposed. It combines Design of experiments (DoE) with physical simulations of mass transport and crystal growth kinetics, serving as an intermediate step between DoE and experiments.
J. Tomkovič +7 more
wiley +1 more source
Amorphous calcium phosphate (ACP) microparticles with long‐term and thermal stability are prepared with or without collagen using a scalable one‐pot spray‐drying process. Under simulated physiological conditions, they crystallize into biomimetic bone mineral and, when combined with collagen, form extrudable, fibrillar bone‐like 3D constructs.
Camila Bussola Tovani +13 more
wiley +1 more source

