Results 11 to 20 of about 1,173 (176)

An Offline Corpus for Legal Translations into Italian: a Case Study with a Land Lease Agreement

open access: yesAltre Modernità, 2020
Offline corpora are claimed to be helpful in technical translations. This paper explores whether a corpus of Italian land lease agreement samples and the civil code can be supportive in legal translations.
Patrizia Giampieri
doaj   +3 more sources

Watermarking Fine-Tuning Datasets for Robust Provenance

open access: yesApplied Sciences
Large Language Models are often fine-tuned on proprietary corpora, motivating reliable provenance signals. A corpus-level watermark method is proposed for fine-tuning datasets that survives training and common text transformations.
Ivo Gergov, Georgi Tsochev
doaj   +2 more sources

Affective Polarization of a Protest and a Counterprotest: Million MAGA March v. Million Moron March [PDF]

open access: yes, 2022
Protest movements around the world have become increasingly likely to incite counterprotests that adopt an opposing stance. This study examines how a protest and a counterprotest interact with and shape each other as digitally networked connective action.
Saif Shahin   +2 more
core   +1 more source

It is not as good as you think! Evaluating simultaneous machine ttranslation on interpretation data [PDF]

open access: yes, 2021
Most existing simultaneous machine translation (SiMT) systems are trained and evaluated on offline translation corpora. We argue that SiMT systems should be trained and tested on real interpretation data.
Trevor Cohn   +14 more
core   +1 more source

Investigating the Far-Right Online: Using Text Data to Understand Online Subcultures [PDF]

open access: yes, 2022
This contribution provides an introduction for social science researchers on the use of computational methods within investigative research for analysing large text corpora to develop an understanding of online communities and subcultures.
Brace, Lewys
core   +2 more sources

Towards Feature Learning for HMM-based Offline Handwriting Recognition [PDF]

open access: yes, 2011
Statistical modelling techniques for automatic reading systems substantially rely on the availability of compact and meaningful feature representations. State-of-the-art feature extraction for offline handwriting recognition is usually based on heuristic
Hammerla, Nils Y.   +3 more
core   +1 more source

Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions [PDF]

open access: yes, 2021
Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response.
Jeff Dalton   +14 more
core   +1 more source

Human language reveals a universal positivity bias [PDF]

open access: yes, 2014
Using human evaluation of 100,000 words spread across 24 corpora in 10 languages diverse in origin and culture, we present evidence of a deep imprint of human sociality in language, observing that (i) the words of natural human language possess a ...
Dodds, Peter Sheridan   +14 more
core   +2 more sources

Automatic offline annotation of turn-taking transitions in task-oriented dialogue

open access: yes, 2023
As the volume of recorded conversations continues to surge, so does the need for their automatic processing. Plenty of information beyond words may be extracted from the speech signal that could be valuable in domains such as call-center quality ...
Gravano, Agustin, Brusco, Pablo
core   +1 more source

Code switching, lexical borrowing, and polylanguaging in Valencian Spanish : an analysis of data from conversational corpora and Twitter

open access: yes, 2017
This study examines lexical borrowing, code switching, and polylanguaging in Valencian Spanish to better understand how each is used differently in oral conversation in comparison with online communication on Twitter.
Lavender, Andrew Jordan
core   +2 more sources

Home - About - Disclaimer - Privacy