Results 1 to 10 of about 443,344 (379)

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking [PDF]

open access: yesarXiv, 2023
Reward models play a key role in aligning language model applications towards human preferences. However, this setup creates an incentive for the language model to exploit errors in the reward model to achieve high estimated reward, a phenomenon often termed \emph{reward hacking}.
Jacob Eisenstein   +11 more
arxiv   +2 more sources

THE REWARD OF LABOUR [PDF]

open access: greenBlackfriars, 1921
“Industrial unrest” is apt to be regarded in X many quarters as something new ana strange, an alarming phenomenon of the present age due to the sudden irruption of mysterious and hitherto unheard-of qualities in man. Frequent allusions are made to the “present unrest.” All manner of solutions, plans, and proposals are offered in the hope that labour ...
Joseph Clayton
openalex   +3 more sources

Direct Preference Optimization: Your Language Model is Secretly a Reward Model [PDF]

open access: yesNeural Information Processing Systems, 2023
While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining
Rafael Rafailov   +5 more
semanticscholar   +1 more source

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment [PDF]

open access: yesTrans. Mach. Learn. Res., 2023
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences.
Hanze Dong   +7 more
semanticscholar   +1 more source

Reward Design with Language Models [PDF]

open access: yesInternational Conference on Learning Representations, 2023
Reward design in reinforcement learning (RL) is challenging since specifying human notions of desired behavior may be difficult via reward functions or require many expert demonstrations.
Minae Kwon   +3 more
semanticscholar   +1 more source

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training [PDF]

open access: yesInternational Conference on Learning Representations, 2022
Reward and representation learning are two long-standing challenges for learning an expanding set of robot manipulation skills from sensory observations.
Yecheng Jason Ma   +5 more
semanticscholar   +1 more source

AN ASSESSMENT OF THE EFFECT OF STRATEGIC PROCUREMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE WITHIN THE PUBLIC SECTOR: CASE OF STATE ENTITY IN ZIMBABWE [PDF]

open access: yesBusiness Excellence and Management, 2023
Although the concept of procurement management has recently garnered attention of researchers, the relationship between strategic procurement practices and organisational performance is still unknown.
Kudzanai CHINOGWЕNYA, Reward UTETE
doaj   +1 more source

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions [PDF]

open access: yesIEEE/RJS International Conference on Intelligent RObots and Systems, 2022
Training a high-dimensional simulated agent with an under-specified reward function often leads the agent to learn physically infeasible strategies that are ineffective when deployed in the real world. To mitigate these unnatural behaviors, reinforcement
Alejandro Escontrela   +6 more
semanticscholar   +1 more source

Perceptions of small and medium companies toward employment equity amendments in South Africa [PDF]

open access: yesProblems and Perspectives in Management, 2022
Small and medium companies (SMCs) are needed for the successful and meaningful development of the South African economy. These companies bring a significant reduction in unemployment levels.
Reward Utete, Thokozani Ian Nzimakwe
doaj   +1 more source

Capacity building as a strategic tool for employment equity implementation in the financial sector

open access: yesSA Journal of Human Resource Management, 2021
Orientation: Employment equity (EE) has gradually seeped into various levels of many organisations, from private to public companies and small to large companies, in both developing and developed countries. Research purpose: The aim of this study was to
Reward Utete
doaj   +1 more source

Home - About - Disclaimer - Privacy