Results 331 to 340 of about 2,703,795 (408)
Some of the next articles are maybe not open access.

Humans or LLMs as the Judge? A Study on Judgement Bias

Conference on Empirical Methods in Natural Language Processing
Adopting human and large language models (LLM) as judges (*a.k.a* human- and LLM-as-a-judge) for evaluating the performance of LLMs has recently gained attention.
Guiming Hardy Chen   +4 more
semanticscholar   +1 more source

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

International Conference on Learning Representations
LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their ...
Jiayi Ye   +11 more
semanticscholar   +1 more source

Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms (Yuan et al., 2024) have shown that LLMs can improve by judging ...
Tianhao Wu   +7 more
semanticscholar   +1 more source

The Impact of Artificial Intelligence on the Right to a Fair Trial: Towards a Robot Judge?

Asian Journal of Law and Economics, 2020
This paper seeks to examine the potential influences AI may have on the right to a fair trial when it is used in the courtroom. Essentially, AI systems can assume two roles in the courtroom.
Jasper Ulenaers
semanticscholar   +1 more source

Judging Truth

Annual Review of Psychology, 2020
Deceptive claims surround us, embedded in fake news, advertisements, political propaganda, and rumors. How do people know what to believe? Truth judgments reflect inferences drawn from three types of information: base rates, feelings, and consistency with information retrieved from memory.
Nadia M, Brashier, Elizabeth J, Marsh
openaire   +2 more sources

Pandering Judges [PDF]

open access: possibleSSRN Electronic Journal, 2008
Tenured public officials such as judges are often thought to be indifferent to theconcerns of the electorate and, as a result, potentially lacking in discipline butunlikely to pander to public opinion. We investigate this proposition empiricallyusing data on promotion decisions taken by senior English judges between 1985 and2005. Throughout this period
Jordi Blanes I Vidal, Clare Leaver
openaire   +3 more sources

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have exhibited great potential in autonomously completing tasks across real-world applications. Despite this, these LLM agents introduce unexpected safety risks when operating in interactive environments. Instead of centering
Tongxin Yuan   +11 more
semanticscholar   +1 more source

No Free Labels: Limitations of LLM-as-a-Judge Without Human Grounding

arXiv.org
LLM-as-a-Judge is a framework that uses an LLM (large language model) to evaluate the quality of natural language text - typically text that is also generated by an LLM.
Michael Krumdick   +4 more
semanticscholar   +1 more source

Agent-as-a-Judge: Evaluate Agents with Agents

International Conference on Machine Learning
Contemporary evaluation techniques are inadequate for agentic systems. These approaches either focus exclusively on final outcomes -- ignoring the step-by-step nature of agentic systems, or require excessive manual labour.
Mingchen Zhuge   +12 more
semanticscholar   +1 more source

Applying Code Quality Detection in Online Programming Judge

International Conferences on Intelligent Information Technology, 2020
This article presents an enhanced programming online judge system that not only evaluates the correctness of the submitted program code but also detects its code quality.
Xiao Liu, G. Woo
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy