Results 11 to 20 of about 302,572 (324)

Language (Technology) is Power: A Critical Survey of "Bias" in NLP [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2020
We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process.
Barocas, Solon   +3 more
core   +2 more sources

Are NLP Models really able to Solve Simple Math Word Problems? [PDF]

open access: yesNorth American Chapter of the Association for Computational Linguistics, 2021
The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing ...
Arkil Patel   +2 more
semanticscholar   +1 more source

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks [PDF]

open access: yesConference on Empirical Methods in Natural Language Processing, 2022
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions ...
Yizhong Wang   +39 more
semanticscholar   +1 more source

From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2023
Language models (LMs) are pretrained on diverse data sources—news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the ...
Shangbin Feng   +3 more
semanticscholar   +1 more source

Preregistering NLP research [PDF]

open access: yesProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Accepted at NAACL2021; pre-final draft, comments ...
van Miltenburg, Emiel   +2 more
openaire   +3 more sources

MasakhaNER: Named Entity Recognition for African Languages

open access: yesTransactions of the Association for Computational Linguistics, 2021
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten
David Ifeoluwa Adelani   +60 more
doaj   +1 more source

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2020
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific ...
Marco Tulio Ribeiro   +3 more
semanticscholar   +1 more source

A Survey of Data Augmentation Approaches for NLP [PDF]

open access: yesFindings, 2021
Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data.
Steven Y. Feng   +6 more
semanticscholar   +1 more source

The State and Fate of Linguistic Diversity and Inclusion in the NLP World [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2020
Language technologies contribute to promoting multilingualism and linguistic diversity around the world. However, only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and ...
Pratik M. Joshi   +4 more
semanticscholar   +1 more source

Sub-Character Tokenization for Chinese Pretrained Language Models

open access: yesTransactions of the Association for Computational Linguistics, 2023
Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token.
Chenglei Si   +8 more
doaj   +1 more source

Home - About - Disclaimer - Privacy