Results 311 to 320 of about 4,127,736 (370)
Some of the next articles are maybe not open access.

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

International Conference on Machine Learning
Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges.
Wei-Lin Chiang   +10 more
semanticscholar   +1 more source

SimPO: Simple Preference Optimization with a Reference-Free Reward

Neural Information Processing Systems
Direct Preference Optimization (DPO) is a widely used offline preference optimization algorithm that reparameterizes reward functions in reinforcement learning from human feedback (RLHF) to enhance simplicity and training stability.
Yu Meng, Mengzhou Xia, Danqi Chen
semanticscholar   +1 more source

ORPO: Monolithic Preference Optimization without Reference Model

Conference on Empirical Methods in Natural Language Processing
While recent preference alignment algorithms for language models have demonstrated promising results, supervised fine-tuning (SFT) remains imperative for achieving successful convergence.
Jiwoo Hong, Noah Lee, James Thorne
semanticscholar   +1 more source

MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation

Knowledge Discovery and Data Mining, 2019
This paper proposes a recommender system to alleviate the cold-start problem that can estimate user preferences based on only a small number of items.
Hoyeop Lee   +4 more
semanticscholar   +1 more source

Metrizable preferences over preferences

Social Choice and Welfare, 2020
zbMATH Open Web Interface contents unavailable due to conflicting licenses.
Laffond, Gilbert   +2 more
openaire   +1 more source

Iterative Reasoning Preference Optimization

Neural Information Processing Systems
Iterative preference optimization methods have recently been shown to perform well for general instruction tuning tasks, but typically make little improvement on reasoning tasks (Yuan et al., 2024, Chen et al., 2024). In this work we develop an iterative
Richard Yuanzhe Pang   +5 more
semanticscholar   +1 more source

HelpSteer2-Preference: Complementing Ratings with Preferences

International Conference on Learning Representations
Reward models are critical for aligning models to follow instructions, and are typically trained following one of two popular paradigms: Bradley-Terry style or Regression style. However, there is a lack of evidence that either approach is better than the
Zhilin Wang   +7 more
semanticscholar   +1 more source

Aesthetic preference and lateral preferences

Neuropsychologia, 1986
Subjects expressed preference for original or mirror-reversed versions of paintings. Hand preference predicted a significant proportion of the choice variance, but eye, foot and ear preference did not, nor did family sinistrality.
openaire   +2 more sources

Musical preferences

2012
This article explores our current understanding of why we like and choose to listen to the music that we do. It begins by defining terms and considering methods, moving on to discuss the biological influences of arousal and other personality traits on music preference, questions of style discrimination, and finally the cultural influences of experience
Alinka Greasley, Alexandra Lamont
openaire   +1 more source

Taste Preferences

2012
Personal experience, learned eating behaviors, hormones, neurotransmitters, and genetic variations affect food consumption. The decision of what to eat is modulated by taste, olfaction, and oral textural perception. Taste, in particular, has an important input into food preference, permitting individuals to differentiate nutritive and harmful ...
María Mercedes, Galindo   +4 more
openaire   +2 more sources

Home - About - Disclaimer - Privacy