History-Based Article Quality Assessment on Wikipedia

Wikipedia is widely considered as the biggest encyclopedia on Internet. Quality assessment of articles on Wikipedia has been studied for years. Conventional methods addressed this task by feature engineering and statistical machine learning algorithms. However, manually defined features are difficult to represent the long edit history of an article. Recently, researchers proposed an end-to-end neural model which used a Recurrent Neural Network(RNN) to learn the representation automatically. Although RNN showed its power in modeling edit history, the end-to-end method is time and resource consuming. In this paper, we propose a new history-based method to represent an article. We also take advantage of an RNN to handle the long edit history, but we do not abandon feature engineering. We still represent each revision of an article by manually defined features. This combination of deep neural model and feature engineering enables our model to be both simple and effective. Experiments demonstrate our model has better or comparable performance than previous works, and has the potential to work as a real-time service. Plus, we extend our model to do quality prediction.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[3]  Padraig Cunningham,et al.  Integration of multiple network views in Wikipedia , 2015, Knowledge and Information Systems.

[4]  Sudha Ram,et al.  Who does what: Collaboration patterns in the wikipedia and their impact on data quality , 2009, International Conference on Wireless Information Technology and Systems.

[5]  Benno Stein,et al.  Predicting quality flaws in user-generated content: the case of wikipedia , 2012, SIGIR '12.

[6]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[7]  Aniket Kittur,et al.  Harnessing the wisdom of crowds in wikipedia: quality through coordination , 2008, CSCW.

[8]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[9]  John Riedl,et al.  Tell me more: an actionable quality model for Wikipedia , 2013, OpenSym.

[10]  Claudia-Lavinia Ignat,et al.  An end-to-end learning solution for assessing the quality of Wikipedia articles , 2017, OpenSym.

[11]  Ee-Peng Lim,et al.  Measuring article quality in wikipedia: models and evaluation , 2007, CIKM '07.

[12]  Les Gasser,et al.  Assessing Information Quality of a Community-Based Encyclopedia , 2005, ICIQ.

[13]  Aniket Kittur,et al.  Herding the cats: the influence of groups in coordinating peer production , 2009, Int. Sym. Wikis.

[14]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[15]  Olivier Teste,et al.  Measuring article quality in Wikipedia using the collaboration network , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[16]  Pável Calado,et al.  Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia , 2009, JCDL '09.

[17]  Joshua Evan Blumenstock,et al.  Size matters: word count as a measure of quality on wikipedia , 2008, WWW.

[18]  Claudia-Lavinia Ignat,et al.  Quality assessment of Wikipedia articles without feature engineering , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Aniket Kittur,et al.  Beyond Wikipedia: coordination and conflict in online production groups , 2010, CSCW '10.

[21]  Loren G. Terveen,et al.  The Success and Failure of Quality Improvement Projects in Peer Production Communities , 2015, CSCW.

[22]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[23]  Oliver Ferschke,et al.  What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data , 2014, WWW.

[24]  John C.-I. Chuang,et al.  The Virtuous Circle of Wikipedia: Recursive Measures of Collaboration Structures , 2015, CSCW.

[25]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[26]  Luca de Alfaro,et al.  Predicting the quality of user contributions via LSTMs , 2016, OpenSym.

[27]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[28]  Sudha Ram,et al.  Who does what: Collaboration patterns in the wikipedia and their impact on article quality , 2011, TMIS.

[29]  Thomas Wöhner,et al.  Assessing the quality of Wikipedia articles with lifecycle based metrics , 2009, Int. Sym. Wikis.

[30]  Benno Stein,et al.  Identifying featured articles in wikipedia: writing style matters , 2010, WWW '10.