BERTweet: A pre-trained language model for English Tweets [PDF]
We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et al., 2019 ...
Dat Quoc Nguyen, Thanh Vu, A. Nguyen
semanticscholar +1 more source
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models [PDF]
Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly ...
Nikita Nangia +3 more
semanticscholar +1 more source
SciBERT: A Pretrained Language Model for Scientific Text [PDF]
Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive. We release SciBERT, a pretrained language model based on BERT (Devlin et.
Iz Beltagy, Kyle Lo, Arman Cohan
semanticscholar +1 more source
How Much Knowledge Can You Pack into the Parameters of a Language Model? [PDF]
It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries.
Adam Roberts, Colin Raffel, Noam Shazeer
semanticscholar +1 more source
Language Models as Knowledge Bases? [PDF]
Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and
F. Petroni +6 more
semanticscholar +1 more source
Engagement in language learning: A systematic review of 20 years of research methods and definitions
At the turn of the new millennium, in an article published in Language Teaching Research in 2000, Dörnyei and Kormos proposed that ‘active learner engagement is a key concern’ for all instructed language learning. Since then, language engagement research
Phil Hiver +3 more
semanticscholar +1 more source
A large annotated corpus for learning natural language inference [PDF]
Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research
Samuel R. Bowman +3 more
semanticscholar +1 more source
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data [PDF]
Many modern NLP systems rely on word embeddings, previously trained in an unsupervised manner on large corpora, as base features. Efforts to obtain embeddings for larger chunks of text, such as sentences, have however not been so successful.
Alexis Conneau +4 more
semanticscholar +1 more source
Rejecting abyssal thinking in the language and education of racialized bilinguals: A manifesto
Following Boaventura de Sousa Santos, the authors of this article reject the type of “abyssal thinking” that erases the existence of counter-hegemonic knowledges and lifeways, adopting instead the “from the inside out” perspective that is required for ...
Ofelia García +5 more
semanticscholar +1 more source
Five sources of bias in natural language processing
Recently, there has been an increased interest in demographically grounded bias in natural language processing (NLP) applications. Much of the recent work has focused on describing bias and providing an overview of bias in a larger context.
E. Hovy, Shrimai Prabhumoye
semanticscholar +1 more source

