Results 1 to 10 of about 580,002 (195)

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback [PDF]

open access: yesarXiv.org, 2022
We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants.
Yuntao Bai   +30 more
semanticscholar   +1 more source

Training Diffusion Models with Reinforcement Learning [PDF]

open access: yesInternational Conference on Learning Representations, 2023
Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives such as human-
Kevin Black   +4 more
semanticscholar   +1 more source

Effect of Randomness of Parameters on Amplification of Ground Motion in Saturated Sedimentary Valley

open access: yesApplied Sciences, 2023
Based on Biot’s theory and the indirect boundary element method (IBEM), the Monte Carlo method is utilized to generate random samples to calculate the displacement response of a saturated sedimentary valley under SV wave incidence.
Ying He   +4 more
doaj   +1 more source

An Experimental Study on Flexural-Shear Behavior of Composite Beams in Precast Frame Structures with Post-Cast Epoxy Resin Concrete

open access: yesBuildings, 2023
Epoxy resin concrete has superior mechanical properties compared to ordinary concrete, and will play an increasingly important role in urban construction.
Peiqi Chen   +3 more
doaj   +1 more source

Simulation of Spatially Correlated Multipoint Ground Motions in a Saturated Alluvial Valley

open access: yesShock and Vibration, 2021
Based on Biot’s theory, the boundary element method, and spectral representation method, an effective simulation method for multiple-station spatially correlated ground motions on both bedrock and surface is developed, incorporating the spectral density ...
Ying He   +4 more
doaj   +1 more source

Deep Reinforcement Learning with Double Q-Learning [PDF]

open access: yesAAAI Conference on Artificial Intelligence, 2015
The popular Q-learning algorithm is known to overestimate action values under certain conditions. It was not previously known whether, in practice, such overestimations are common, whether they harm performance, and whether they can generally be ...
H. V. Hasselt, A. Guez, David Silver
semanticscholar   +1 more source

Reinforcement learning [PDF]

open access: yesScholarpedia, 2019
The discussion here considers a much more common learning condition where an agent, such as a human or a robot, has to learn to make decisions in the environment from simple feedback. Such feedback is provided only after periods of actions in the form of
F. Wörgötter, B. Porr
semanticscholar   +1 more source

Study on permeability law of water-based polymer drilling fluid containing CaCl2 in wellbore formation

open access: yes地质科技通报, 2021
The use of microbially induced carbonate precipitation (MICP) technology to improve the cementation quality of oil and gas well cementing has attracted more and more attention in recent years.
Tianle Liu   +5 more
doaj   +1 more source

Amplification Effect of Ground Motion in Offshore Meandering Sedimentary Valley

open access: yesShock and Vibration, 2021
A sedimentary valley has a visible amplification effect on a seismic response, and the current 2D topographies cannot truthfully reflect the twists and turns of a large-scale river valley.
Hailiang Wang   +3 more
doaj   +1 more source

Reinforcement Learning: A Survey [PDF]

open access: yesJournal of Artificial Intelligence Research, 1996
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning.
L. Kaelbling, M. Littman, A. Moore
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy