Results 341 to 350 of about 2,703,795 (408)
Some of the next articles are maybe not open access.
How to Judge a Book by its Cover
Let's read! We will often find out this sentence everywhere. When still being a kid, mom used to order us to always read, so did the teacher. Some books are fully read in a week and we need the obligation to support reading. What about now?
Kate Cuthbert
semanticscholar +1 more source
Self-Preference Bias in LLM-as-a-Judge
arXiv.orgAutomated evaluation leveraging large language models (LLMs), commonly referred to as LLM evaluators or LLM-as-a-judge, has been widely used in measuring the performance of dialogue systems. However, the self-preference bias in LLMs has posed significant
Koki Wataoka +2 more
semanticscholar +1 more source
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Conference on Empirical Methods in Natural Language ProcessingLarge Language Models (LLMs) are powerful zero-shot assessors used in real-world situations such as assessing written exams and benchmarking systems. Despite these critical applications, no existing work has analyzed the vulnerability of judge-LLMs to ...
Vyas Raina +2 more
semanticscholar +1 more source
Improve LLM-as-a-Judge Ability as a General Ability
Proceedings of the 2025 Conference on Empirical Methods in Natural Language ProcessingLLM-as-a-Judge leverages the generative and reasoning capabilities of large language models (LLMs) to evaluate LLM responses across diverse scenarios, providing accurate preference signals.
Jiachen Yu +5 more
semanticscholar +1 more source
MCTS-Judge: Test-Time Scaling in LLM-as-a-Judge for Code Correctness Evaluation
arXiv.orgThe LLM-as-a-Judge paradigm shows promise for evaluating generative content but lacks reliability in reasoning-intensive scenarios, such as programming. Inspired by recent advances in reasoning models and shifts in scaling laws, we pioneer bringing test ...
Yutong Wang +6 more
semanticscholar +1 more source
Can LLM be a Personalized Judge?
Conference on Empirical Methods in Natural Language ProcessingEnsuring that large language models (LLMs) reflect diverse user values and preferences is crucial as their user bases expand globally. It is therefore encouraging to see the growing interest in LLM personalization within the research community.
Yijiang River Dong +2 more
semanticscholar +1 more source
Judge: identifying, understanding, and evaluating sources of unsoundness in call graphs
International Symposium on Software Testing and Analysis, 2019Call graphs are widely used; in particular for advanced control- and data-flow analyses. Even though many call graph algorithms with different precision and scalability properties have been proposed, a comprehensive understanding of sources of ...
Michael Reif +4 more
semanticscholar +1 more source
arXiv.org
LLM-as-a-Judge has been widely applied to evaluate and compare different LLM alignmnet approaches (e.g., RLHF and DPO). However, concerns regarding its reliability have emerged, due to LLM judges' biases and inconsistent decision-making.
Hui Wei +5 more
semanticscholar +1 more source
LLM-as-a-Judge has been widely applied to evaluate and compare different LLM alignmnet approaches (e.g., RLHF and DPO). However, concerns regarding its reliability have emerged, due to LLM judges' biases and inconsistent decision-making.
Hui Wei +5 more
semanticscholar +1 more source
Reassessing the good judge of personality.
Journal of Personality and Social Psychology, 2019Are some people truly better able to accurately perceive the personality of others? Previous research suggests that the good judge may be of little practical importance and individual differences minimal.
Katherine H. Rogers, J. Biesanz
semanticscholar +1 more source
Comedy Studies, 2020
The courtroom is, perhaps surprisingly frequently, the site of interludes, interruptions – prone to frustration or humour when the not-so-consistently-well-oiled machinery of justice lets loose a p...
openaire +1 more source
The courtroom is, perhaps surprisingly frequently, the site of interludes, interruptions – prone to frustration or humour when the not-so-consistently-well-oiled machinery of justice lets loose a p...
openaire +1 more source

