Reward - Open Access .click

Results 231 to 240 of about 442,274 (281)

Some of the next articles are maybe not open access.

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Neural Information Processing Systems, 2023
While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining
Rafael Rafailov +5 more
semanticscholar +1 more source

Eureka: Human-Level Reward Design via Coding Large Language Models

International Conference on Learning Representations, 2023
Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem.
Y. Ma +8 more
semanticscholar +1 more source

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

arXiv.org, 2023
Reward models play a key role in aligning language model applications towards human preferences. However, this setup creates an incentive for the language model to exploit errors in the reward model to achieve high estimated reward, a phenomenon often ...
Jacob Eisenstein +11 more
semanticscholar +1 more source

Rewards

Journal of the American College of Radiology, 2011
For much of the 20th century, psychologists and economists operated on the assumption that work is devoid of intrinsic rewards, and the only way to get people to work harder is through the use of rewards and punishments. This so-called carrot-and-stick model of workplace motivation, when applied to medical practice, emphasizes the use of financial ...
Richard B, Gunderman, Aaron P, Kamer
openaire +2 more sources

Rewarding prayers

Neuroscience Letters, 2008
We report a highly significant regional increase of the BOLD response in the caudate nucleus in a group of Danish Christians while performing silent religious prayer. The effect was found in a main-effect analysis of high-structured and low-structured religious recitals relative to comparable secular recitals and to a non-narrative baseline.
Schjødt, Uffe +3 more
openaire +4 more sources

Parsing reward

Trends in Neurosciences, 2003
Advances in neurobiology permit neuroscientists to manipulate specific brain molecules, neurons and systems. This has lead to major advances in the neuroscience of reward. Here, it is argued that further advances will require equal sophistication in parsing reward into its specific psychological components: (1) learning (including explicit and implicit
Kent C, Berridge, Terry E, Robinson
openaire +2 more sources

SimPO: Simple Preference Optimization with a Reference-Free Reward

Neural Information Processing Systems
Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability.
Yu Meng, Mengzhou Xia, Danqi Chen
semanticscholar +1 more source

ToolRL: Reward is All Tool Learning Needs

arXiv.org
Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize to unfamiliar or complex tool use scenarios.
Cheng Qian +7 more
semanticscholar +1 more source

Reward

2019
Neurons throughout frontal cortex show robust responses to rewards, but a challenge is determining the specific function served by these different reward signals. Most neuropsychiatric disorders involve dysfunction of circuits between frontal cortex and subcortical structures, such as the striatum. There are multiple frontostriatal loops, and different
openaire +3 more sources

Abstracting reward

Behavioral and Brain Sciences, 2020
Abstract The costs of and returns from actions are varied and individually concrete dimensions, combined in heterogeneous ways. The many needs of the body also fluctuate. Making action selection efficiently track some ultimate goal, whether fitness or another utility function, itself requires representational abstraction.
openaire +2 more sources

medicine
psychology
computer science

humans
biology
motivation

dopamine
south africa
animals