Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing [PDF]
This article surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning.” Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x),
Pengfei Liu +5 more
semanticscholar +1 more source
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [PDF]
Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data.
Chengwei Qin +5 more
semanticscholar +1 more source
Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey [PDF]
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically changed the Natural Language Processing (NLP) field. For numerous NLP tasks, approaches leveraging PLMs have achieved state-of-the-art performance. The key idea is to learn a
Bonan Min +8 more
semanticscholar +1 more source
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [PDF]
Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web.
Yu Gu +8 more
semanticscholar +1 more source
Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing [PDF]
Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research.
Benedikt Boecking +11 more
semanticscholar +1 more source
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages [PDF]
We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization ...
Peng Qi +4 more
semanticscholar +1 more source
PyThaiNLP: Thai Natural Language Processing in Python [PDF]
We present PyThaiNLP, a free and open-source natural language processing (NLP) library for Thai language implemented in Python. It provides a wide range of software, models, and datasets for Thai language.
Wannaphong Phatthiyaphaibun +8 more
semanticscholar +1 more source
Datasets: A Community Library for Natural Language Processing [PDF]
The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets
Quentin Lhoest +31 more
semanticscholar +1 more source
Data Augmentation Approaches in Natural Language Processing: A Survey [PDF]
As an effective strategy, data augmentation (DA) alleviates data scarcity scenarios where deep learning techniques may fail. It is widely applied in computer vision then introduced to natural language processing and achieves improvements in many tasks ...
Bohan Li, Yutai Hou, Wanxiang Che
semanticscholar +1 more source
Multimodal Classification of Safety-Report Observations
Modern businesses are obligated to conform to regulations to prevent physical injuries and ill health for anyone present on a site under their responsibility, such as customers, employees and visitors.
Georgios Paraskevopoulos +4 more
doaj +1 more source

