Content moderation - Open Access .click

Results 271 to 280 of about 4,786,818 (328)

Some of the next articles are maybe not open access.

AoIR Selected Papers of Internet Research
Despite popular understandings of the internet teeming with pornography and offering safe harbor for LGBTQIA+ content, social media platforms are cracking down on sexual expression and sex work online. This panel examines the role that content moderation
Alexander Monea +4 more
semanticscholar +2 more sources

Content Moderation

2023
As terrorist, extremist, and hateful content has become widespread on social media, platforms have responded with content moderation – the flagging, review, and enforcement of rules and standards on user-generated content online. This chapter provides an introduction to contemporary content moderation practices, technologies, and contexts and outlines ...
Nachshon (Sean) Goltz +3 more
+6 more sources

Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models

Conference on Fairness, Accountability and Transparency
Content moderation plays a critical role in shaping safe and inclusive online environments, balancing platform standards, user expectations, and regulatory frameworks.
Konstantina Palla +7 more
semanticscholar +1 more source

BingoGuard: LLM Content Moderation Tools with Risk Levels

International Conference on Learning Representations
Malicious content generated by large language models (LLMs) can pose varying degrees of harm. Although existing LLM-based moderators can detect harmful content, they struggle to assess risk levels and may miss lower-risk outputs. Accurate risk assessment
Fan Yin +9 more
semanticscholar +1 more source

PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

arXiv.org
Recent text-to-image (T2I) models have exhibited remarkable performance in generating high-quality images from text descriptions. However, these models are vulnerable to misuse, particularly generating not-safe-for-work (NSFW) content, such as sexually ...
Lingzhi Yuan +9 more
semanticscholar +1 more source

ShieldGemma: Generative AI Content Moderation Based on Gemma

arXiv.org
We present ShieldGemma, a comprehensive suite of LLM-based safety content moderation models built upon Gemma2. These models provide robust, state-of-the-art predictions of safety risks across key harm types (sexually explicit, dangerous content ...
Wenjun Zeng +11 more
semanticscholar +1 more source

ShieldGemma 2: Robust and Tractable Image Content Moderation

arXiv.org
We introduce ShieldGemma 2, a 4B parameter image content moderation model built on Gemma 3. This model provides robust safety risk predictions across the following key harm categories: Sexually Explicit, Violence \&Gore, and Dangerous Content for ...
Wenjun Zeng +16 more
semanticscholar +1 more source

The End of Trust and Safety?: Examining the Future of Content Moderation and Upheavals in Professional Online Safety Efforts

International Conference on Human Factors in Computing Systems
Trust & Safety (T&S) teams have become vital parts of tech platforms; ensuring safe platform use and combating abuse, harassment, and misinformation. However, between 2021 and 2023, T&S teams faced significant layoffs, impacted by broader downsizing in ...
Rachel Moran +3 more
semanticscholar +1 more source

X-Guard: Multilingual Guard Agent for Content Moderation

arXiv.org
Large Language Models (LLMs) have rapidly become integral to numerous applications in critical domains where reliability is paramount. Despite significant advances in safety frameworks and guardrails, current protective measures exhibit crucial ...
Bibek Upadhayay +53 more
semanticscholar +1 more source

Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations

International Conference on Human Factors in Computing Systems
Commercial content moderation APIs are marketed as scalable solutions to combat online hate speech. However, the reliance on these APIs risks both silencing legitimate speech, called over-moderation, and failing to protect online platforms from harmful ...
David Hartmann +5 more
semanticscholar +1 more source

social media
platform governance
sociology

political science
law
general works

16. peace & justice
religious moderation
communication