Results 91 to 100 of about 571,567 (314)

Benchmark Health Index: A Systematic Framework for Benchmarking the Benchmarks of LLMs

open access: yesCoRR
Large Language Models (LLMs) are advancing rapidly, yet the benchmarks used to measure this progress are becoming increasingly unreliable. Score inflation and selective reporting have eroded the authority of standard benchmarks, leaving the community uncertain about which evaluation results remain trustworthy.
Longyuan Zhu   +3 more
openaire   +2 more sources

A Benchmark Approach to Investing and Pricing [PDF]

open access: yes
This paper introduces a general market modeling framework, the benchmark approach, which assumes the existence of the numeraire portfolio. This is the strictly positive portfolio that when used as benchmark makes all benchmarked nonnegative portfolios ...
Eckhard Platen
core  

Minimal Important Change and Minimal Clinically Important Difference in Pain and Function With Exercise in Hip Osteoarthritis

open access: yesArthritis Care &Research, EarlyView.
Objective The objective of this study was to estimate the minimal important change (MIC) and minimal clinically important difference (MCID) for pain and physical function in individuals with hip osteoarthritis (OA) following a physiotherapist‐guided exercise intervention.
Yareni Guerrero   +8 more
wiley   +1 more source

Characterization of Defect Distribution in an Additively Manufactured AlSi10Mg as a Function of Processing Parameters and Correlations with Extreme Value Statistics

open access: yesAdvanced Engineering Materials, EarlyView.
Predicting extreme defects in additive manufacturing remains a key challenge limiting its structural reliability. This study proposes a statistical framework that integrates Extreme Value Theory with advanced process indicators to explore defect–process relationships and improve the estimation of critical defect sizes. The approach provides a basis for
Muhammad Muteeb Butt   +8 more
wiley   +1 more source

Exploring the feasibility of integrating ultra‐high field magnetic resonance imaging neuroimaging with multimodal artificial intelligence for clinical diagnostics

open access: yesiRADIOLOGY
Background The integration of 7 Tesla (7T) magnetic resonance imaging (MRI) with advanced multimodal artificial intelligence (AI) models represents a promising frontier in neuroimaging.
Yifan Yuan   +9 more
doaj   +1 more source

A Hitchhiker Guide to Structural Variant Calling: A Comprehensive Benchmark Through Different Sequencing Technologies

open access: yesBiomedicines
Background: Structural variants (SVs) play a significant role in gene function and are implicated in numerous human diseases. With advances in sequencing technologies, identifying SVs through whole-genome sequencing (WGS) has become a key area of ...
Giuseppe Giovanni Nardone   +8 more
doaj   +1 more source

SSoelvsten/bdd-benchmark: TACAS 2022

open access: yes, 2021
The BDD Benchmark repository at the time of submitting our paper on Adiar 1.0.1 to TACAS 2022. Quite a lot of experiments were left out of the TACAS paper due to space constraints, but all of them are described in the arXiv paper.
Steffan Sølvsten
core   +1 more source

What Do Large Language Models Know About Materials?

open access: yesAdvanced Engineering Materials, EarlyView.
If large language models (LLMs) are to be used inside the material discovery and engineering process, they must be benchmarked for the accurateness of intrinsic material knowledge. The current work introduces 1) a reasoning process through the processing–structure–property–performance chain and 2) a tool for benchmarking knowledge of LLMs concerning ...
Adrian Ehrenhofer   +2 more
wiley   +1 more source

DERI1000: A New Benchmark for Dataset Explainability Readiness

open access: yesAI
Deep learning models are increasingly evaluated not only for predictive accuracy but also for their robustness, interpretability, and data quality dependencies.
Andrej Pisarcik   +2 more
doaj   +1 more source

Benchmark Data Repositories for Better Benchmarking

open access: yesAdvances in Neural Information Processing Systems 37
In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are ...
Rachel Longjohn   +3 more
openaire   +3 more sources

Home - About - Disclaimer - Privacy