Published April 7, 2023 | Version 3
Dataset Open

Dataset of inappropriate utterances on sensitive topics in Russian

  • 1. Babakov
  • 2. Logacheva
  • 3. Panchenko

Description

This dataset is dedicated to inappropriate messages -- the messages on a sensitive topic that can frustrate the reader and/or harm the reputation of the speaker. The concept of inappropriateness is rather close to toxicity, however, the clear toxicity itself, as well as explicit obscenity, has been intentionally dropped from this dataset.

Not all messages related to sensitive topics are inappropriate. For example, speaking about racism you may either attack or protect someone. The main aim of this dataset is to detect appropriate and inappropriate utterances within known sensitive topics.

Files

Inappapropriate_messages.csv

Files (23.7 MB)

Name Size Download all
md5:d6d885edd3118802ce17b74952427437
23.7 MB Preview Download