Results 341 to 350 of about 2,703,795 (408)
Some of the next articles are maybe not open access.

How to Judge a Book by its Cover


Let's read! We will often find out this sentence everywhere. When still being a kid, mom used to order us to always read, so did the teacher. Some books are fully read in a week and we need the obligation to support reading. What about now?
Kate Cuthbert
semanticscholar   +1 more source

Self-Preference Bias in LLM-as-a-Judge

arXiv.org
Automated evaluation leveraging large language models (LLMs), commonly referred to as LLM evaluators or LLM-as-a-judge, has been widely used in measuring the performance of dialogue systems. However, the self-preference bias in LLMs has posed significant
Koki Wataoka   +2 more
semanticscholar   +1 more source

Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment

Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) are powerful zero-shot assessors used in real-world situations such as assessing written exams and benchmarking systems. Despite these critical applications, no existing work has analyzed the vulnerability of judge-LLMs to ...
Vyas Raina   +2 more
semanticscholar   +1 more source

Improve LLM-as-a-Judge Ability as a General Ability

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
LLM-as-a-Judge leverages the generative and reasoning capabilities of large language models (LLMs) to evaluate LLM responses across diverse scenarios, providing accurate preference signals.
Jiachen Yu   +5 more
semanticscholar   +1 more source

MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation

arXiv.org
The LLM-as-a-Judge paradigm shows promise for evaluating generative content but lacks reliability in reasoning-intensive scenarios, such as programming. Inspired by recent advances in reasoning models and shifts in scaling laws, we pioneer bringing test ...
Yutong Wang   +6 more
semanticscholar   +1 more source

Can LLM be a Personalized Judge?

Conference on Empirical Methods in Natural Language Processing
Ensuring that large language models (LLMs) reflect diverse user values and preferences is crucial as their user bases expand globally. It is therefore encouraging to see the growing interest in LLM personalization within the research community.
Yijiang River Dong   +2 more
semanticscholar   +1 more source

Judge: identifying, understanding, and evaluating sources of unsoundness in call graphs

International Symposium on Software Testing and Analysis, 2019
Call graphs are widely used; in particular for advanced control- and data-flow analyses. Even though many call graph algorithms with different precision and scalability properties have been proposed, a comprehensive understanding of sources of ...
Michael Reif   +4 more
semanticscholar   +1 more source

Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates

arXiv.org
LLM-as-a-Judge has been widely applied to evaluate and compare different LLM alignmnet approaches (e.g., RLHF and DPO). However, concerns regarding its reliability have emerged, due to LLM judges' biases and inconsistent decision-making.
Hui Wei   +5 more
semanticscholar   +1 more source

Reassessing the good judge of personality.

Journal of Personality and Social Psychology, 2019
Are some people truly better able to accurately perceive the personality of others? Previous research suggests that the good judge may be of little practical importance and individual differences minimal.
Katherine H. Rogers, J. Biesanz
semanticscholar   +1 more source

Judges, Judging and Humour

Comedy Studies, 2020
The courtroom is, perhaps surprisingly frequently, the site of interludes, interruptions – prone to frustration or humour when the not-so-consistently-well-oiled machinery of justice lets loose a p...
openaire   +1 more source

Home - About - Disclaimer - Privacy