Results 21 to 30 of about 353,387 (75)

Tracking Patterns in Toxicity and Antisocial Behavior Over User Lifetimes on Large Social Media Platforms [PDF]

open access: yesarXiv
An increasing amount of attention has been devoted to the problem of "toxic" or antisocial behavior on social media. In this paper we analyze such behavior at very large scales: we analyze toxicity over a 14-year time span on nearly 500 million comments from Reddit and Wikipedia, grounded in two different proxies for toxicity. At the individual level,
arxiv  

Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph [PDF]

open access: yesarXiv
The rapid growth of social media platforms has raised significant concerns regarding online content toxicity. When Large Language Models (LLMs) are used for toxicity detection, two key challenges emerge: 1) the absence of domain-specific toxic knowledge leads to false negatives; 2) the excessive sensitivity of LLMs to toxic speech results in false ...
arxiv  

The Landscape of Toxicity: An Empirical Investigation of Toxicity on GitHub [PDF]

open access: yesarXiv
Toxicity on GitHub can severely impact Open Source Software (OSS) development communities. To mitigate such behavior, a better understanding of its nature and how various measurable characteristics of project contexts and participants are associated with its prevalence is necessary.
arxiv  

Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric [PDF]

open access: yesEMNLP2024 findings
In the pursuit of developing Large Language Models (LLMs) that adhere to societal standards, it is imperative to detect the toxicity in the generated text. The majority of existing toxicity metrics rely on encoder models trained on specific toxicity datasets, which are susceptible to out-of-distribution (OOD) problems and depend on the dataset's ...
arxiv  

Beyond Toxic Neurons: A Mechanistic Analysis of DPO for Toxicity Reduction [PDF]

open access: yesNeurIPS 2024 Workshop on Socially Responsible Language Modelling Research (SoLaR)
Safety fine-tuning algorithms are widely used to reduce harmful outputs in language models, but how they achieve this remain unclear. Studying the Direct Preference Optimization (DPO) algorithm for toxicity reduction, current explanations claim that DPO achieves this by dampening the activations of toxic MLP neurons.
arxiv  

Deceiving Google's Perspective API Built for Detecting Toxic Comments [PDF]

open access: yesarXiv, 2017
Social media platforms provide an environment where people can freely engage in discussions. Unfortunately, they also enable several problems, such as online harassment. Recently, Google and Jigsaw started a project called Perspective, which uses machine learning to automatically detect toxic language.
arxiv  

DAPI: Domain Adaptive Toxicity Probe Vector Intervention for Fine-Grained Detoxification [PDF]

open access: yesarXiv
There have been attempts to utilize linear probe for detoxification, with existing studies relying on a single toxicity probe vector to reduce toxicity. However, toxicity can be fine-grained into various subcategories, making it difficult to remove certain types of toxicity by using a single toxicity probe vector. To address this limitation, we propose
arxiv  

Home - About - Disclaimer - Privacy