WikiHarass: Difference between revisions
From Algolit
Line 1: | Line 1: | ||
{| | {| | ||
|- | |- | ||
− | | Type: || | + | | Type: || Dataset |
|- | |- | ||
− | | | + | | Developed by: || English Wikipedia |
− | |||
− | |||
|} | |} | ||
Revision as of 15:02, 25 October 2017
Type: | Dataset |
Developed by: | English Wikipedia |
The Detox dataset was used by Wikimedia and Perspective API to train a neural network that would detect the level of toxicity of a comment.
The dataset consists of:
- A corpus of all 95 million user and article talk diffs made between 2001–2015 scored by the personal attack model.
- A human annotated dataset of 1m crowd-sourced annotations that cover 100k talk page diffs (with 10 judgements per diff).