An Offline Corpus for Legal Translations into Italian: a Case Study with a Land Lease Agreement
Offline corpora are claimed to be helpful in technical translations. This paper explores whether a corpus of Italian land lease agreement samples and the civil code can be supportive in legal translations.
Patrizia Giampieri
doaj +3 more sources
Watermarking Fine-Tuning Datasets for Robust Provenance
Large Language Models are often fine-tuned on proprietary corpora, motivating reliable provenance signals. A corpus-level watermark method is proposed for fine-tuning datasets that survives training and common text transformations.
Ivo Gergov, Georgi Tsochev
doaj +2 more sources
Affective Polarization of a Protest and a Counterprotest: Million MAGA March v. Million Moron March [PDF]
Protest movements around the world have become increasingly likely to incite counterprotests that adopt an opposing stance. This study examines how a protest and a counterprotest interact with and shape each other as digitally networked connective action.
Saif Shahin +2 more
core +1 more source
It is not as good as you think! Evaluating simultaneous machine ttranslation on interpretation data [PDF]
Most existing simultaneous machine translation (SiMT) systems are trained and evaluated on offline translation corpora. We argue that SiMT systems should be trained and tested on real interpretation data.
Trevor Cohn +14 more
core +1 more source
Investigating the Far-Right Online: Using Text Data to Understand Online Subcultures [PDF]
This contribution provides an introduction for social science researchers on the use of computational methods within investigative research for analysing large text corpora to develop an understanding of online communities and subcultures.
Brace, Lewys
core +2 more sources
Towards Feature Learning for HMM-based Offline Handwriting Recognition [PDF]
Statistical modelling techniques for automatic reading systems substantially rely on the availability of compact and meaningful feature representations. State-of-the-art feature extraction for offline handwriting recognition is usually based on heuristic
Hammerla, Nils Y. +3 more
core +1 more source
Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions [PDF]
Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response.
Jeff Dalton +14 more
core +1 more source
Human language reveals a universal positivity bias [PDF]
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a ...
Dodds, Peter Sheridan +14 more
core +2 more sources
Automatic offline annotation of turn-taking transitions in task-oriented dialogue
As the volume of recorded conversations continues to surge, so does the need for their automatic processing. Plenty of information beyond words may be extracted from the speech signal that could be valuable in domains such as call-center quality ...
Gravano, Agustin, Brusco, Pablo
core +1 more source
This study examines lexical borrowing, code switching, and polylanguaging in Valencian Spanish to better understand how each is used differently in oral conversation in comparison with online communication on Twitter.
Lavender, Andrew Jordan
core +2 more sources

