FullStop:Punctuation and Segmentation Prediction for Dutch with Transformers [PDF]
When applying automated speech recognition (ASR) for Belgian Dutch (Van Dyck et al. 2021), the output consists of an unsegmented stream of words, without any punctuation. A next step is to perform segmentation and insert punctuation, making the ASR output more readable and easy to manually correct.
arxiv
The search for longitude: Preliminary insights from a 17th Century Dutch perspective [PDF]
In the 17th Century, the Dutch Republic played an important role in the scientific revolution. Much of the correspondence among contemporary scientists and their associates is now digitally available through the ePistolarium webtool, allowing current scientists and historians unfettered access to transcriptions of some 20,000 letters from the Dutch ...
arxiv
A Dutch book argument for belief consistency [PDF]
An agent progressively learns about a state of the world. A bookmaker is ready to offer one bet after every new discovery. I say that the agent is Dutch-booked when she is willing to accept every single bet, but her expected payoff is negative under each state, where the expected payoff is computed with the objective probabilities of different ...
arxiv
belabBERT: a Dutch RoBERTa-based language model applied to psychiatric classification [PDF]
Natural language processing (NLP) is becoming an important means for automatic recognition of human traits and states, such as intoxication, presence of psychiatric disorders, presence of airway disorders and states of stress. Such applications have the potential to be an important pillar for online help lines, and may gradually be introduced into ...
arxiv
RobBERT: a Dutch RoBERTa-based Language Model [PDF]
Pre-trained language models have been dominating the field of natural language processing in recent years, and have led to significant performance gains for various complex natural language tasks. One of the most prominent pre-trained language models is BERT, which was released as an English as well as a multilingual version. Although multilingual BERT
arxiv
Multi-Graph Decoding for Code-Switching ASR [PDF]
In the FAME! Project, a code-switching (CS) automatic speech recognition (ASR) system for Frisian-Dutch speech is developed that can accurately transcribe the local broadcaster's bilingual archives with CS speech. This archive contains recordings with monolingual Frisian and Dutch speech segments as well as Frisian-Dutch CS speech, hence the ...
arxiv
Opinion aspect extraction in Dutch childrens diary entries [PDF]
Aspect extraction can be used in dialogue systems to understand the topic of opinionated text. Expressing an empathetic reaction to an opinion can strengthen the bond between a human and, for example, a robot. The aim of this study is three-fold: 1. create a new annotated dataset for both aspect extraction and opinion words for Dutch childrens language,
arxiv
Variability in the interpretation of Dutch probability phrases - a risk for miscommunication [PDF]
Verbal probability phrases are often used to express estimated risk. In this study, focus was on the numerical interpretation of 29 Dutch probability and frequency phrases, including several complementary phrases to test (a)symmetry in their interpretation. Many of these phrases had not been studied before.
arxiv
The merits of Universal Language Model Fine-tuning for Small Datasets -- a case with Dutch book reviews [PDF]
We evaluated the effectiveness of using language models, that were pre-trained in one domain, as the basis for a classification model in another domain: Dutch book reviews. Pre-trained language models have opened up new possibilities for classification tasks with limited labelled data, because representation can be learned in an unsupervised fashion ...
arxiv
ChocoLlama: Lessons Learned From Teaching Llamas Dutch [PDF]
While Large Language Models (LLMs) have shown remarkable capabilities in natural language understanding and generation, their performance often lags in lower-resource, non-English languages due to biases in the training data. In this work, we explore strategies for adapting the primarily English LLMs (Llama-2 and Llama-3) to Dutch, a language spoken by
arxiv