On the Use of PU Learning for Quality Flaw Prediction in Wikipedia

Edgardo Ferretti and Marcelo Errecalde thank Universidad Nacional de San Luis (PROICO 30310). The collaboration of UNSL, INAOE and UPV has been funded by the European Commission as part of the WIQ-EI project (project no. 269180) within the FP7 People Programme. Manuel Montes is partially supported by CONACYT, No. 134186. The work of Paolo Rosso was carried out also in the framework of the MICINN Text-Enterprise (TIN2009-13391-C04-03) research project and the Microcluster VLC/Campus (International Campus of Excellence) on Multimodal Intelligent Systems.

[1]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[2]  Ricardo Baeza-Yates,et al.  User generated content: how good is it? , 2009, WICOW.

[3]  Benno Stein,et al.  Predicting quality flaws in user-generated content: the case of wikipedia , 2012, SIGIR '12.

[4]  Benno Stein,et al.  Identifying featured articles in wikipedia: writing style matters , 2010, WWW '10.

[5]  Benno Stein,et al.  Measuring the quality of web content using factual information , 2012, WebQuality '12.

[6]  Les Gasser,et al.  Assessing Information Quality of a Community-Based Encyclopedia , 2005, ICIQ.

[7]  Wanli Zuo,et al.  Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples , 2009, J. Comput..

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Pável Calado,et al.  Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia , 2009, JCDL '09.

[10]  Benno Stein,et al.  Towards automatic quality assurance in Wikipedia , 2011, WWW.

[11]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[12]  Benno Stein,et al.  A breakdown of quality flaws in Wikipedia , 2012, WebQuality '12.

[13]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[14]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[15]  Joshua Evan Blumenstock,et al.  Size matters: word count as a measure of quality on wikipedia , 2008, WWW.

[16]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[17]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[18]  Benno Stein,et al.  Detection of text quality flaws as a one-class classification problem , 2011, CIKM '11.