Results 241 to 250 of about 143,475 (289)
Some of the next articles are maybe not open access.

Who moderates the moderators?

Proceedings of the 12th ACM conference on Electronic commerce, 2011
A large fraction of user-generated content on the Web, such as posts or comments on popular online forums, consists of abuse or spam. Due to the volume of contributions on popular sites, a few trusted moderators cannot identify all such abusive content, so viewer ratings of contributions must be used for moderation. But not all viewers who rate content
Arpita Ghosh   +2 more
openaire   +1 more source

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Neural Information Processing Systems
We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate.
Seungju Han   +7 more
semanticscholar   +1 more source

Do Not Recommend? Reduction as a Form of Content Moderation

Social Media + Society, 2022
Public debate about content moderation has overwhelmingly focused on removal: social media platforms deleting content and suspending users, or opting not to do so. However, removal is not the only available remedy.
Tarleton Gillespie
semanticscholar   +1 more source

The moderator's moderator

Materials Science and Technology, 1990
AbstractA brief account is given of the development stages of graphite moderators for Magnox and advanced gas cooled reactors. The accident at Windscale in 1957 brought to worldwide attention the importance of irradiation damage in graphite and the consequent storage of Wigner energy.
openaire   +1 more source

ShieldGemma: Generative AI Content Moderation Based on Gemma

arXiv.org
We present ShieldGemma, a comprehensive suite of LLM-based safety content moderation models built upon Gemma2. These models provide robust, state-of-the-art predictions of safety risks across key harm types (sexually explicit, dangerous content ...
Wenjun Zeng   +11 more
semanticscholar   +1 more source

Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models

Conference on Fairness, Accountability and Transparency
Content moderation plays a critical role in shaping safe and inclusive online environments, balancing platform standards, user expectations, and regulatory frameworks.
Konstantina Palla   +7 more
semanticscholar   +1 more source

Religious moderation in Instagram: An Islamic interpretation perspective

Heliyon
Religious freedom and plurality remain major challenges in Indonesia, with both authorities and social media influencers involved. One potential solution is integrating moderation into religious activities, especially through platforms like Instagram ...
A. Hadiyanto   +2 more
semanticscholar   +1 more source

Moderate presentism

Philosophical Studies, 2015
Typical presentism asserts that whatever exists is present. Moderate presentism more modestly claims that all events are present and thus acknowledges past and future times understood in a substantivalist sense, and past objects understood, following Williamson, as “ex-concrete.” It is argued that moderate presentism retains the most valuable features ...
openaire   +2 more sources

AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

arXiv.org
As Large Language Models (LLMs) and generative AI become more widespread, the content safety risks associated with their use also increase. We find a notable deficiency in high-quality content safety datasets and benchmarks that comprehensively cover a ...
Shaona Ghosh   +3 more
semanticscholar   +1 more source

Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters

Neural Information Processing Systems
Large Language Models (LLMs) are typically harmless but remain vulnerable to carefully crafted prompts known as ``jailbreaks'', which can bypass protective measures and induce harmful behavior.
Haibo Jin   +3 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy