Results 91 to 100 of about 571,567 (314)
Benchmark Health Index: A Systematic Framework for Benchmarking the Benchmarks of LLMs
Large Language Models (LLMs) are advancing rapidly, yet the benchmarks used to measure this progress are becoming increasingly unreliable. Score inflation and selective reporting have eroded the authority of standard benchmarks, leaving the community uncertain about which evaluation results remain trustworthy.
Longyuan Zhu +3 more
openaire +2 more sources
A Benchmark Approach to Investing and Pricing [PDF]
This paper introduces a general market modeling framework, the benchmark approach, which assumes the existence of the numeraire portfolio. This is the strictly positive portfolio that when used as benchmark makes all benchmarked nonnegative portfolios ...
Eckhard Platen
core
Objective The objective of this study was to estimate the minimal important change (MIC) and minimal clinically important difference (MCID) for pain and physical function in individuals with hip osteoarthritis (OA) following a physiotherapist‐guided exercise intervention.
Yareni Guerrero +8 more
wiley +1 more source
Predicting extreme defects in additive manufacturing remains a key challenge limiting its structural reliability. This study proposes a statistical framework that integrates Extreme Value Theory with advanced process indicators to explore defect–process relationships and improve the estimation of critical defect sizes. The approach provides a basis for
Muhammad Muteeb Butt +8 more
wiley +1 more source
Background The integration of 7 Tesla (7T) magnetic resonance imaging (MRI) with advanced multimodal artificial intelligence (AI) models represents a promising frontier in neuroimaging.
Yifan Yuan +9 more
doaj +1 more source
Background: Structural variants (SVs) play a significant role in gene function and are implicated in numerous human diseases. With advances in sequencing technologies, identifying SVs through whole-genome sequencing (WGS) has become a key area of ...
Giuseppe Giovanni Nardone +8 more
doaj +1 more source
SSoelvsten/bdd-benchmark: TACAS 2022
The BDD Benchmark repository at the time of submitting our paper on Adiar 1.0.1 to TACAS 2022. Quite a lot of experiments were left out of the TACAS paper due to space constraints, but all of them are described in the arXiv paper.
Steffan Sølvsten
core +1 more source
What Do Large Language Models Know About Materials?
If large language models (LLMs) are to be used inside the material discovery and engineering process, they must be benchmarked for the accurateness of intrinsic material knowledge. The current work introduces 1) a reasoning process through the processing–structure–property–performance chain and 2) a tool for benchmarking knowledge of LLMs concerning ...
Adrian Ehrenhofer +2 more
wiley +1 more source
DERI1000: A New Benchmark for Dataset Explainability Readiness
Deep learning models are increasingly evaluated not only for predictive accuracy but also for their robustness, interpretability, and data quality dependencies.
Andrej Pisarcik +2 more
doaj +1 more source
Benchmark Data Repositories for Better Benchmarking
In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are ...
Rachel Longjohn +3 more
openaire +3 more sources

