Results 21 to 30 of about 3,070,369 (321)
Data Augmentation for Low-Resource Neural Machine Translation [PDF]
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we
Bisazza, Arianna +2 more
core +4 more sources
A “Law≓ of occurrences for words of low frequency
Summary: The way in which the number of words occurring once, twice, three times, and so on in a text is related to the vocabulary of the author has been investigated. It is shown that a simple relationship holds under more general conditions than those implied by Zipf's law.
A. Booth
openaire +2 more sources
The effects of sentence structure on the recall of low frequency words [PDF]
The recall of low frequency words presented in either sentence order or random order was found to interact with trials. The recall of words presented in sentence order was inferior on early trials but superior on later trials.
W. L. Porter
openaire +2 more sources
Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation [PDF]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models. However, there exists a discrepancy on low-frequency words between the distilled and the original data, leading to more ...
Liang Ding +5 more
semanticscholar +1 more source
Problems with Cosine as a Measure of Embedding Similarity for High Frequency Words [PDF]
Cosine similarity of contextual embeddings is used in many NLP tasks (e.g., QA, IR, MT) and metrics (e.g., BERTScore). Here, we uncover systematic ways in which word similarities estimated by cosine over BERT embeddings are understated and trace this ...
Kaitlyn Zhou +3 more
semanticscholar +1 more source
Knowledge distillation (KD) is the preliminary step for training non-autoregressive translation (NAT) models, which eases the training of NAT models at the cost of losing important information for translating low-frequency words. In this work, we provide
Liang Ding +4 more
semanticscholar +1 more source
Hapax remains: Regularity of low-frequency words in authorial texts [PDF]
Abstract This article highlights the usual overlook in the literature of regular occurrences of low-frequency words (hapax legomena) in specific authors’ texts. This overlook arises from a linguistic assumption of non-systematic and context-dependent low-frequency word occurrences in extensive texts, and from the tendency of SVM methods ...
Dan Faltýnek, Vladimír Matlach
openaire +1 more source
: This paper reports the results of a study that investigated the effectiveness of guessing as a word-solving strategy. Conducted to 32 non-native English language teacher trainees, this study disÂcovered that with a vocabulary size of 3526 word ...
Siusana Kweldju
doaj +1 more source
Word frequency influences on the list length effect and associative memory in young and older adults [PDF]
Many studies show that age deficits in memory are smaller for information supported by preexperimental experience. Many studies also find dissociations in memory tasks between words that occur with high and low frequencies in language, but the literature
Badham, Stephen P. +3 more
core +3 more sources
Bayesian estimation‐based sentiment word embedding model for sentiment analysis
Sentiment word embedding has been extensively studied and used in sentiment analysis tasks. However, most existing models have failed to differentiate high‐frequency and low‐frequency words.
Jingyao Tang +7 more
doaj +1 more source

