The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. [PDF]
AbstractBackgroundTo evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet.
Chicco D, Jurman G.
europepmc +10 more sources
90% F1 Score in Relation Triple Extraction: Is it Real? [PDF]
Accepted in GenBench workshop @ EMNLP ...
Pratik Saini+3 more
semanticscholar +5 more sources
Machine-learning classification of astronomical sources: estimating F1-score in the absence of ground truth [PDF]
ABSTRACT Machine-learning based classifiers have become indispensable in the field of astrophysics, allowing separation of astronomical sources into various classes, with computational efficiency suitable for application to the enormous data volumes that wide-area surveys now typically produce.
A. Humphrey+7 more
semanticscholar +7 more sources
Anomaly Detection: How to Artificially Increase your F1-Score with a Biased Evaluation Protocol [PDF]
Anomaly detection is a widely explored domain in machine learning. Many models are proposed in the literature, and compared through different metrics measured on various datasets. The most popular metrics used to compare performances are F1-score, AUC and AVPR. In this paper, we show that F1-score and AVPR are highly sensitive to the contamination rate.
Damien Fourure+3 more
semanticscholar +7 more sources
Maximum F1-score training for end-to-end mispronunciation detection and diagnosis of L2 English speech [PDF]
End-to-end (E2E) neural models are increasingly attracting attention as a promising modeling approach for mispronunciation detection and diagnosis (MDD). Typically, these models are trained by optimizing a cross-entropy criterion, which corresponds to improving the log-likelihood of the training data.
Bicheng Yan+3 more
semanticscholar +5 more sources
Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models [PDF]
In pursuit of the perfect supervised NLP classifier, razor thin margins and low-resource test sets can make modeling decisions difficult. Popular metrics such as Accuracy, Precision, and Recall are often insufficient as they fail to give a complete picture of the model’s behavior. We present a probabilistic extension of Precision, Recall, and F1 score,
Reda Yacouby, Dustin Axman
semanticscholar +4 more sources
An intruder from another world: F1-score.
El F1-score, también llamado F-score o medida F, es un estimador de la capacidad de clasificación de una prueba que se usa con frecuencia en la ciencia de datos y en los algoritmos de inteligencia artificial y que puede ser de utilidad para la valoración de las pruebas diagnósticas.
Manuel Molina
semanticscholar +6 more sources
About Evaluation of F1 Score for RECENT Relation Extraction System [PDF]
This document contains a discussion of the F1 score evaluation used in the article 'Relation Classification with Entity Type Restriction' by Shengfei Lyu, Huanhuan Chen published on Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Michał Olek
semanticscholar +5 more sources
Confidence Intervals for the F1 Score: A Comparison of Four Methods
In Natural Language Processing (NLP), binary classification algorithms are often evaluated using the F1 score. Because the sample F1 score is an estimate of the population F1 score, it is not sufficient to report the sample F1 score without an indication
Kevin Fu Yuan Lam
core +5 more sources
Keeping Pathologists in the Loop and an Adaptive F1-Score Threshold Method for Mitosis Detection in Canine Perivascular Wall Tumours. [PDF]
Performing a mitosis count (MC) is the diagnostic task of histologically grading canine Soft Tissue Sarcoma (cSTS). However, mitosis count is subject to inter- and intra-observer variability. Deep learning models can offer a standardisation in the process of MC used to histologically grade canine Soft Tissue Sarcomas.
Rai T+8 more
europepmc +6 more sources