Results 11 to 20 of about 142,938 (289)

90% F1 Score in Relation Triple Extraction: Is it Real? [PDF]

open access: goldProceedings of the 1st GenBench Workshop on (Benchmarking) Generalisation in NLP, 2023
Accepted in GenBench workshop @ EMNLP ...
Pratik Saini   +3 more
openalex   +3 more sources

Calibrating F1 Scores for Fair Performance Comparison of Binary Classification Models With Application to Student Dropout Prediction

open access: yesIEEE Access
The F1 score has been widely used to measure the performance of machine learning models. However, it is variant to the ratio of the positive class in the training data, $\pi $ .
Hyeon Gyu Kim, Yoohyun Park
doaj   +2 more sources

Performance Metrics for Multilabel Emotion Classification: Comparing Micro, Macro, and Weighted F1-Scores

open access: yesApplied Sciences
This study compares various F1-score variants—micro, macro, and weighted—to assess their performance in evaluating text-based emotion classification. Lexicon distillation is employed using the multilabel emotion-annotated datasets XED and GoEmotions. The
Maria Cristina Hinojosa Lee   +2 more
doaj   +3 more sources

The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. [PDF]

open access: yesBMC Genomics, 2020
AbstractBackgroundTo evaluate binary classifications and their confusion matrices, scientific researchers can employ several statistical rates, accordingly to the goal of the experiment they are investigating. Despite being a crucial issue in machine learning, no widespread consensus has been reached on a unified elective chosen measure yet.
Chicco D, Jurman G.
europepmc   +6 more sources

Estimating the Uncertainty of Average F1 Scores [PDF]

open access: yesProceedings of the 2015 International Conference on The Theory of Information Retrieval, 2015
In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data.
Zhang, Dell, Wang, J., Zhao, X.
openaire   +1 more source

A Bayesian Hierarchical Model for Comparing Average F1 Scores [PDF]

open access: yes2015 IEEE International Conference on Data Mining, 2015
In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data.
Zhang, Dell   +3 more
openaire   +1 more source

Time to retire F1-binary score for action unit detection

open access: yesPattern Recognition Letters, 2023
Detecting action units is an important task in face analysis, especially in facial expression recognition. This is due, in part, to the idea that expressions can be decomposed into multiple action units. To evaluate systems that detect action units, F1-binary score is often used as the evaluation metric.
Saurabh Hinduja   +3 more
openaire   +2 more sources

Confidence interval for micro-averaged F1 and macro-averaged F1 scores [PDF]

open access: yesApplied Intelligence, 2021
AbstractA binary classification problem is common in medical field, and we often use sensitivity, specificity, accuracy, negative and positive predictive values as measures of performance of a binary predictor. In computer science, a classifier is usually evaluated with precision (positive predictive value) and recall (sensitivity). As a single summary
Kanae Takahashi   +3 more
openaire   +2 more sources

Home - About - Disclaimer - Privacy