Tracking Patterns in Toxicity and Antisocial Behavior Over User Lifetimes on Large Social Media Platforms [PDF]
An increasing amount of attention has been devoted to the problem of "toxic" or antisocial behavior on social media. In this paper we analyze such behavior at very large scales: we analyze toxicity over a 14-year time span on nearly 500 million comments from Reddit and Wikipedia, grounded in two different proxies for toxicity. At the individual level,
arxiv
Enhancing LLM-based Hatred and Toxicity Detection with Meta-Toxic Knowledge Graph [PDF]
The rapid growth of social media platforms has raised significant concerns regarding online content toxicity. When Large Language Models (LLMs) are used for toxicity detection, two key challenges emerge: 1) the absence of domain-specific toxic knowledge leads to false negatives; 2) the excessive sensitivity of LLMs to toxic speech results in false ...
arxiv
A FRACTION OF TUBERCLE BACILLI POSSESSING PRIMARY TOXICITY [PDF]
John K. Spitznagel, René Dubos
openalex +1 more source
The Landscape of Toxicity: An Empirical Investigation of Toxicity on GitHub [PDF]
Toxicity on GitHub can severely impact Open Source Software (OSS) development communities. To mitigate such behavior, a better understanding of its nature and how various measurable characteristics of project contexts and participants are associated with its prevalence is necessary.
arxiv
The toxicity of sodium tartrate with special reference to diet and tolerance [PDF]
William Salant, Cheryl S. Smith
openalex +1 more source
On the Beneficent and Toxical Effects of the Various Species of Rhus [PDF]
Tawny Burgess
openalex +1 more source
Can LLMs Recognize Toxicity? A Structured Investigation Framework and Toxicity Metric [PDF]
In the pursuit of developing Large Language Models (LLMs) that adhere to societal standards, it is imperative to detect the toxicity in the generated text. The majority of existing toxicity metrics rely on encoder models trained on specific toxicity datasets, which are susceptible to out-of-distribution (OOD) problems and depend on the dataset's ...
arxiv
Beyond Toxic Neurons: A Mechanistic Analysis of DPO for Toxicity Reduction [PDF]
Safety fine-tuning algorithms are widely used to reduce harmful outputs in language models, but how they achieve this remain unclear. Studying the Direct Preference Optimization (DPO) algorithm for toxicity reduction, current explanations claim that DPO achieves this by dampening the activations of toxic MLP neurons.
arxiv
Deceiving Google's Perspective API Built for Detecting Toxic Comments [PDF]
Social media platforms provide an environment where people can freely engage in discussions. Unfortunately, they also enable several problems, such as online harassment. Recently, Google and Jigsaw started a project called Perspective, which uses machine learning to automatically detect toxic language.
arxiv
DAPI: Domain Adaptive Toxicity Probe Vector Intervention for Fine-Grained Detoxification [PDF]
There have been attempts to utilize linear probe for detoxification, with existing studies relying on a single toxicity probe vector to reduce toxicity. However, toxicity can be fine-grained into various subcategories, making it difficult to remove certain types of toxicity by using a single toxicity probe vector. To address this limitation, we propose
arxiv