Results 321 to 330 of about 2,703,795 (408)
Some of the next articles are maybe not open access.
Related searches:
Related searches:
Management and Organization Review, 2020
This replication study was invited by the Editor in Chief of Management and Organization Review, Arie Y. Lewin. The original study by Judge, Fainshmidt, and Brown (2014) spanned the global financial crisis (2005–2010), and as such, this anomalous time ...
William Q. Judge +2 more
exaly +2 more sources
This replication study was invited by the Editor in Chief of Management and Organization Review, Arie Y. Lewin. The original study by Judge, Fainshmidt, and Brown (2014) spanned the global financial crisis (2005–2010), and as such, this anomalous time ...
William Q. Judge +2 more
exaly +2 more sources
arXiv.org
Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Language Models (LLMs) have achieved remarkable success across diverse
Jiawei Gu +11 more
semanticscholar +1 more source
Accurate and consistent evaluation is crucial for decision-making across numerous fields, yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large Language Models (LLMs) have achieved remarkable success across diverse
Jiawei Gu +11 more
semanticscholar +1 more source
Preference Leakage: A Contamination Problem in LLM-as-a-judge
arXiv.orgLarge Language Models (LLMs) as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development.
Dawei Li +8 more
semanticscholar +1 more source
American Economic Review, 2023
We propose a nonparametric test for the exclusion and monotonicity assumptions invoked in instrumental variable (IV) designs based on the random assignment of cases to judges. We show its asymptotic validity and demonstrate its finite-sample performance in simulations. We apply our test in an empirical setting from the literature examining the effects
Brigham Frandsen +2 more
openaire +1 more source
We propose a nonparametric test for the exclusion and monotonicity assumptions invoked in instrumental variable (IV) designs based on the random assignment of cases to judges. We show its asymptotic validity and demonstrate its finite-sample performance in simulations. We apply our test in an empirical setting from the literature examining the effects
Brigham Frandsen +2 more
openaire +1 more source
Can LLMs Replace Human Evaluators? An Empirical Study of LLM-as-a-Judge in Software Engineering
Proc. ACM Softw. Eng.Recently, large language models (LLMs) have been deployed to tackle various software engineering (SE) tasks like code generation, significantly advancing the automation of SE tasks.
Ruiqi Wang +5 more
semanticscholar +1 more source
JudgeLRM: Large Reasoning Models as a Judge
arXiv.orgLarge Language Models (LLMs) are increasingly adopted as evaluators, offering a scalable alternative to human annotation. However, existing supervised fine-tuning (SFT) approaches often fall short in domains that demand complex reasoning.
Nuo Chen +6 more
semanticscholar +1 more source
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
International Conference on Machine LearningLLM-as-a-Judge models generate chain-of-thought (CoT) sequences intended to capture the step-bystep reasoning process that underlies the final evaluation of a response.
Swarnadeep Saha +4 more
semanticscholar +1 more source
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Proceedings of the 2025 Conference on Empirical Methods in Natural Language ProcessingAssessment and evaluation have long been critical challenges in artificial intelligence (AI) and natural language processing (NLP). Traditional methods, usually matching-based or small model-based, often fall short in open-ended and dynamic scenarios ...
Dawei Li +12 more
semanticscholar +1 more source

