Results 291 to 300 of about 5,619,500 (318)
Some of the next articles are maybe not open access.

Safe RLHF: Safe Reinforcement Learning from Human Feedback

International Conference on Learning Representations, 2023
With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical.
Josef Dai   +7 more
semanticscholar   +1 more source

Deep Reinforcement Learning

International Conference on Computing Communication and Networking Technologies, 2023
Deep Reinforcement Learning (DRL) is a powerful technique for learning policies for complex decision-making tasks. In this paper, we provide an overview of DRL, including its basic components, key algorithms and techniques, and applications in areas s.a.
Sahil Sharma   +2 more
semanticscholar   +1 more source

A Survey of Reinforcement Learning from Human Feedback

arXiv.org, 2023
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function.
Timo Kaufmann   +3 more
semanticscholar   +1 more source

Reinforcement and learning

Evolutionary Ecology, 2007
Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song).
Stein Are Sæther   +2 more
openaire   +3 more sources

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

arXiv.org
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable ...
DeepSeek-AI   +197 more
semanticscholar   +1 more source

Reinforcement learning in surgery

Surgery, 2021
Patients and physicians make essential decisions regarding diagnostic and therapeutic interventions. These actions should be performed or deferred under time constraints and uncertainty regarding patients' diagnoses and predicted response to treatment. This may lead to cognitive and judgment errors.
Shounak Datta   +7 more
openaire   +3 more sources

Reinforcement Learning and Deep Reinforcement Learning

2019
In order to better understand state-of-the-art reinforcement learning agent, deep Q-network, a brief review of reinforcement learning and Q-learning are first described. Then recent advances of deep Q-network are presented, and double deep Q-network and dueling deep Q-network that go beyond deep Q-network are also given.
F. Richard Yu, Ying He
openaire   +2 more sources

Meta-learning in Reinforcement Learning

Neural Networks, 2003
Meta-parameters in reinforcement learning should be tuned to the environmental dynamics and the animal performance. Here, we propose a biologically plausible meta-reinforcement learning algorithm for tuning these meta-parameters in a dynamic, adaptive manner.
Nicolas Schweighofer, Kenji Doya
openaire   +3 more sources

Gymnasium: A Standard Interface for Reinforcement Learning Environments

arXiv.org
Reinforcement Learning (RL) is a continuously growing field that has the potential to revolutionize many areas of artificial intelligence. However, despite its promise, RL research is often hindered by the lack of standardization in environment and ...
Mark Towers   +15 more
semanticscholar   +1 more source

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

arXiv.org
Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 ...
Qiying Yu   +34 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy