Results 1 to 10 of about 580,002 (195)
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback [PDF]
We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants.
Yuntao Bai+30 more
semanticscholar +1 more source
Training Diffusion Models with Reinforcement Learning [PDF]
Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-
Kevin Black+4 more
semanticscholar +1 more source
Effect of Randomness of Parameters on Amplification of Ground Motion in Saturated Sedimentary Valley
Based on Biot’s theory and the indirect boundary element method (IBEM), the Monte Carlo method is utilized to generate random samples to calculate the displacement response of a saturated sedimentary valley under SV wave incidence.
Ying He+4 more
doaj +1 more source
Epoxy resin concrete has superior mechanical properties compared to ordinary concrete, and will play an increasingly important role in urban construction.
Peiqi Chen+3 more
doaj +1 more source
Simulation of Spatially Correlated Multipoint Ground Motions in a Saturated Alluvial Valley
Based on Biot’s theory, the boundary element method, and spectral representation method, an effective simulation method for multiple-station spatially correlated ground motions on both bedrock and surface is developed, incorporating the spectral density ...
Ying He+4 more
doaj +1 more source
Deep Reinforcement Learning with Double Q-Learning [PDF]
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be ...
H. V. Hasselt, A. Guez, David Silver
semanticscholar +1 more source
The discussion here considers a much more common learning condition where an agent, such as a human or a robot, has to learn to make decisions in the environment from simple feedback. Such feedback is provided only after periods of actions in the form of
F. Wörgötter, B. Porr
semanticscholar +1 more source
The use of microbially induced carbonate precipitation (MICP) technology to improve the cementation quality of oil and gas well cementing has attracted more and more attention in recent years.
Tianle Liu+5 more
doaj +1 more source
Amplification Effect of Ground Motion in Offshore Meandering Sedimentary Valley
A sedimentary valley has a visible amplification effect on a seismic response, and the current 2D topographies cannot truthfully reflect the twists and turns of a large-scale river valley.
Hailiang Wang+3 more
doaj +1 more source
Reinforcement Learning: A Survey [PDF]
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning.
L. Kaelbling, M. Littman, A. Moore
semanticscholar +1 more source