Results 291 to 300 of about 5,619,500 (318)
Some of the next articles are maybe not open access.
Safe RLHF: Safe Reinforcement Learning from Human Feedback
International Conference on Learning Representations, 2023With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical.
Josef Dai +7 more
semanticscholar +1 more source
International Conference on Computing Communication and Networking Technologies, 2023
Deep Reinforcement Learning (DRL) is a powerful technique for learning policies for complex decision-making tasks. In this paper, we provide an overview of DRL, including its basic components, key algorithms and techniques, and applications in areas s.a.
Sahil Sharma +2 more
semanticscholar +1 more source
Deep Reinforcement Learning (DRL) is a powerful technique for learning policies for complex decision-making tasks. In this paper, we provide an overview of DRL, including its basic components, key algorithms and techniques, and applications in areas s.a.
Sahil Sharma +2 more
semanticscholar +1 more source
A Survey of Reinforcement Learning from Human Feedback
arXiv.org, 2023Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function.
Timo Kaufmann +3 more
semanticscholar +1 more source
Evolutionary Ecology, 2007
Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song).
Stein Are Sæther +2 more
openaire +3 more sources
Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song).
Stein Are Sæther +2 more
openaire +3 more sources
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
arXiv.orgWe introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable ...
DeepSeek-AI +197 more
semanticscholar +1 more source
Reinforcement learning in surgery
Surgery, 2021Patients and physicians make essential decisions regarding diagnostic and therapeutic interventions. These actions should be performed or deferred under time constraints and uncertainty regarding patients' diagnoses and predicted response to treatment. This may lead to cognitive and judgment errors.
Shounak Datta +7 more
openaire +3 more sources
Reinforcement Learning and Deep Reinforcement Learning
2019In order to better understand state-of-the-art reinforcement learning agent, deep Q-network, a brief review of reinforcement learning and Q-learning are first described. Then recent advances of deep Q-network are presented, and double deep Q-network and dueling deep Q-network that go beyond deep Q-network are also given.
F. Richard Yu, Ying He
openaire +2 more sources
Meta-learning in Reinforcement Learning
Neural Networks, 2003Meta-parameters in reinforcement learning should be tuned to the environmental dynamics and the animal performance. Here, we propose a biologically plausible meta-reinforcement learning algorithm for tuning these meta-parameters in a dynamic, adaptive manner.
Nicolas Schweighofer, Kenji Doya
openaire +3 more sources
Gymnasium: A Standard Interface for Reinforcement Learning Environments
arXiv.orgReinforcement Learning (RL) is a continuously growing field that has the potential to revolutionize many areas of artificial intelligence. However, despite its promise, RL research is often hindered by the lack of standardization in environment and ...
Mark Towers +15 more
semanticscholar +1 more source
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
arXiv.orgInference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 ...
Qiying Yu +34 more
semanticscholar +1 more source

