Results 11 to 20 of about 353,387 (75)

Detoxifying Language Models with a Toxic Corpus [PDF]

open access: yesarXiv, 2022
Existing studies have investigated the tendency of autoregressive language models to generate contexts that exhibit undesired biases and toxicity. Various debiasing approaches have been proposed, which are primarily categorized into data-based and decoding-based. In our study, we investigate the ensemble of the two debiasing paradigms, proposing to use
arxiv  

ReSeTOX: Re-learning attention weights for toxicity mitigation in machine translation [PDF]

open access: yesarXiv, 2023
Our proposed method, ReSeTOX (REdo SEarch if TOXic), addresses the issue of Neural Machine Translation (NMT) generating translation outputs that contain toxic words not present in the input. The objective is to mitigate the introduction of toxic language without the need for re-training.
arxiv  

Toxicity in Multilingual Machine Translation at Scale [PDF]

open access: yesarXiv, 2022
Machine Translation systems can produce different types of errors, some of which are characterized as critical or catastrophic due to the specific negative impact that they can have on users. In this paper we focus on one type of critical error: added toxicity.
arxiv  

Twits, Toxic Tweets, and Tribal Tendencies: Trends in Politically Polarized Posts on Twitter [PDF]

open access: yesarXiv, 2023
Social media platforms are often blamed for exacerbating political polarization and worsening public dialogue. Many claim that hyperpartisan users post pernicious content, slanted to their political views, inciting contentious and toxic conversations. However, what factors are actually associated with increased online toxicity and negative interactions?
arxiv  

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models [PDF]

open access: yesarXiv, 2020
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arxiv  

Towards Robust Toxic Content Classification [PDF]

open access: yesarXiv, 2019
Toxic content detection aims to identify content that can offend or harm its recipients. Automated classifiers of toxic content need to be robust against adversaries who deliberately try to bypass filters. We propose a method of generating realistic model-agnostic attacks using a lexicon of toxic tokens, which attempts to mislead toxicity classifiers ...
arxiv  

Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games [PDF]

open access: yesarXiv, 2015
In this work we explore cyberbullying and other toxic behavior in team competition online games. Using a dataset of over 10 million player reports on 1.46 million toxic players along with corresponding crowdsourced decisions, we test several hypotheses drawn from theories explaining toxic behavior.
arxiv  

Home - About - Disclaimer - Privacy