论文信息 - Elusive vandalism detection in wikipedia: a text stability-based approach

Elusive vandalism detection in wikipedia: a text stability-based approach

The open collaborative nature of wikis encourages participation of all users, but at the same time exposes their content to vandalism. The current vandalism-detection techniques, while effective against relatively obvious vandalism edits, prove to be inadequate in detecting increasingly prevalent sophisticated (or elusive) vandal edits. We identify a number of vandal edits that can take hours, even days, to correct and propose a text stability-based approach for detecting them. Our approach is focused on the likelihood of a certain part of an article being modified by a regular edit. In addition to text-stability, our machine learning-based technique also takes into account edit patterns. We evaluate the performance of our approach on a corpus comprising of 15000 manually labeled edits from the Wikipedia Vandalism PAN corpus. The experimental results show that text-stability is able to improve the performance of the selected machine-learning algorithms significantly.

[1] Thomas Wöhner,et al. Assessing the quality of Wikipedia articles with lifecycle based metrics , 2009, Int. Sym. Wikis.

[2] Aaron Halfaker,et al. A jury of your peers: quality, experience and ownership in Wikipedia , 2009, Int. Sym. Wikis.

[3] John Riedl,et al. Creating, destroying, and restoring value in wikipedia , 2007, GROUP.

[4] Benno Stein,et al. Automatic Vandalism Detection in Wikipedia , 2008, ECIR.

[5] Padmini Srinivasan,et al. Detecting Wikipedia vandalism with active learning and statistical language models , 2010, WICOW '10.

[6] Luca de Alfaro,et al. A content-driven reputation system for the wikipedia , 2007, WWW '07.

[7] Amit Belani,et al. Vandalism Detection in Wikipedia: a Bag-of-Words Classifier Approach , 2010, ArXiv.

[8] Martin Wattenberg,et al. Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[9] Insup Lee,et al. Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata? , 2010, EUROSEC '10.