Results 301 to 310 of about 5,876,040 (331)

Safe RLHF: Safe Reinforcement Learning from Human Feedback

International Conference on Learning Representations, 2023
With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical.
Josef Dai   +7 more
semanticscholar   +1 more source

Deep Reinforcement Learning

International Conference on Computing Communication and Networking Technologies, 2023
Deep Reinforcement Learning (DRL) is a powerful technique for learning policies for complex decision-making tasks. In this paper, we provide an overview of DRL, including its basic components, key algorithms and techniques, and applications in areas s.a.
Sahil Sharma   +2 more
semanticscholar   +1 more source

Reinforcement and learning

Evolutionary Ecology, 2007
Evidence has been accumulating to support the process of reinforcement as a potential mechanism in speciation. In many species, mate choice decisions are influenced by cultural factors, including learned mating preferences (sexual imprinting) or learned mate attraction signals (e.g., bird song).
Stein Are Sæther   +2 more
openaire   +3 more sources

Reinforcement learning in surgery

Surgery, 2021
Patients and physicians make essential decisions regarding diagnostic and therapeutic interventions. These actions should be performed or deferred under time constraints and uncertainty regarding patients' diagnoses and predicted response to treatment. This may lead to cognitive and judgment errors.
Shounak Datta   +7 more
openaire   +3 more sources

Reinforcement Learning and Deep Reinforcement Learning

2019
In order to better understand state-of-the-art reinforcement learning agent, deep Q-network, a brief review of reinforcement learning and Q-learning are first described. Then recent advances of deep Q-network are presented, and double deep Q-network and dueling deep Q-network that go beyond deep Q-network are also given.
F. Richard Yu, Ying He
openaire   +2 more sources

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

arXiv.org
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable ...
DeepSeek-AI   +197 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy