Semantic Evaluation of Multilingual Data-to-Text Generation via NLI Fine-Tuning: Precision, Recall and F1 scores [PDF]
William Soto Martinez +2 more
openalex +2 more sources
Anomaly Detection: How to Artificially Increase Your F1-Score with a Biased Evaluation Protocol [PDF]
Anomaly detection is a widely explored domain in machine learning. Many models are proposed in the literature, and compared through different metrics measured on various datasets. The most popular metrics used to compare performances are F1-score, AUC and AVPR. In this paper, we show that F1-score and AVPR are highly sensitive to the contamination rate.
Damien Fourure +3 more
openaire +2 more sources
sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification
Multiclass multilabel classification is the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of the multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.).
Bénédict, G. +3 more
openaire +3 more sources
Automatic Feature Segmentation in Dental Periapical Radiographs
While a large number of archived digital images make it easy for radiology to provide data for Artificial Intelligence (AI) evaluation; AI algorithms are more and more applied in detecting diseases.
Tugba Ari +10 more
doaj +1 more source
Confidence Intervals for the F1 Score: A Comparison of Four Methods
31 pages, 3 ...
Lam, Kevin Fu Yuan +2 more
openaire +2 more sources
Coreference Resolution through a seq2seq Transition-Based System
Most recent coreference resolution systems use search algorithms over possible spans to identify mentions and resolve coreference. We instead present a coreference resolution system that uses a text-to-text (seq2seq) paradigm to predict mentions and ...
Bernd Bohnet +2 more
doaj +1 more source
Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models [PDF]
In pursuit of the perfect supervised NLP classifier, razor thin margins and low-resource test sets can make modeling decisions difficult. Popular metrics such as Accuracy, Precision, and Recall are often insufficient as they fail to give a complete picture of the model’s behavior. We present a probabilistic extension of Precision, Recall, and F1 score,
Reda Yacouby, Dustin Axman
openaire +1 more source
Thresholding Classifiers to Maximize F1 Score
This paper provides new insight into maximizing F1 scores in the context of binary classification and also in the context of multilabel classification. The harmonic mean of precision and recall, F1 score is widely used to measure the success of a binary classifier when one class is rare.
Lipton, Zachary Chase +2 more
openaire +2 more sources
Analyze Informant-Based Questionnaire for The Early Diagnosis of Senile Dementia Using Deep Learning
Objective: This paper proposes a multiclass deep learning method for the classification of dementia using an informant-based questionnaire. Methods: A deep neural network classification model based on Keras framework is proposed in this paper.
Fubao Zhu +8 more
doaj +1 more source
Machine learning approaches for anomaly detection of water quality on a real-world data set
Accurate detection of water quality changes is a crucial task of water companies. Water supply companies must provide safe drinking water. Nowadays in different areas, we find sensible sensors which monitor data during the time.
Fitore Muharemi +2 more
doaj +1 more source

