Language (Technology) is Power: A Critical Survey of "Bias" in NLP [PDF]
We survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing "bias" is an inherently normative process.
Barocas, Solon +3 more
core +2 more sources
Are NLP Models really able to Solve Simple Math Word Problems? [PDF]
The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing ...
Arkil Patel +2 more
semanticscholar +1 more source
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks [PDF]
How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions ...
Yizhong Wang +39 more
semanticscholar +1 more source
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models [PDF]
Language models (LMs) are pretrained on diverse data sources—news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the ...
Shangbin Feng +3 more
semanticscholar +1 more source
Preregistering NLP research [PDF]
Accepted at NAACL2021; pre-final draft, comments ...
van Miltenburg, Emiel +2 more
openaire +3 more sources
MasakhaNER: Named Entity Recognition for African Languages
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten
David Ifeoluwa Adelani +60 more
doaj +1 more source
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [PDF]
Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates the performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or on specific ...
Marco Tulio Ribeiro +3 more
semanticscholar +1 more source
A Survey of Data Augmentation Approaches for NLP [PDF]
Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data.
Steven Y. Feng +6 more
semanticscholar +1 more source
The State and Fate of Linguistic Diversity and Inclusion in the NLP World [PDF]
Language technologies contribute to promoting multilingualism and linguistic diversity around the world. However, only a very small number of the over 7000 languages of the world are represented in the rapidly evolving language technologies and ...
Pratik M. Joshi +4 more
semanticscholar +1 more source
Sub-Character Tokenization for Chinese Pretrained Language Models
Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token.
Chenglei Si +8 more
doaj +1 more source

