A Multi-view Approach for the Quality Assessment of Wiki Articles

Wikipedia is a great example of a very large repository of information with free access and open edition, created by the community in a collaborative manner. However, this large amount of information, made available democratically and virtually without any control, raises questions about its quality. To deal with this problem, some studies attempt to assess the quality of articles in Wikipedia automatically. In these studies, a large number of quality indicators is usually collected and then combined in order to obtain a single value representing the quality of the article. In this work, we propose to group these indicators in semantically meaningful views of quality and investigate a new approach to combine these views based on a meta-learning method, known as stacking. Particularly, we grouped the indicators into three views (textual, review history and citation graph), and demonstrated that it is possible to use this approach in collaborative encyclopedias such as Wikipedia and Wikia. In our experimental evaluation, we obtained gains of up to 18% compared the state-of-the-art quality assessment method that  considers all indicators at once.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Pável Calado,et al.  Automatic Assessment of Document Quality in Web Collaborative Digital Libraries , 2011, JDIQ.

[4]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[5]  Pável Calado,et al.  Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia , 2009, JCDL '09.

[6]  Brian Mingus,et al.  Exploring the Feasibility of Automatically Rating Online Article Quality , 2007 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Stephen Barrett,et al.  Extracting Trust from Domain Analysis: A Case Study on the Wikipedia Project , 2006, ATC.

[9]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[10]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[11]  Wei Chu,et al.  A Unified Loss Function in Bayesian Framework for Support Vector Regression , 2001, ICML.

[12]  Kim H. Veltman Access, claims and quality on the internet-Future challenges , 2005 .

[13]  Thomas Wöhner,et al.  Assessing the quality of Wikipedia articles with lifecycle based metrics , 2009, Int. Sym. Wikis.

[14]  Sham M. Kakade,et al.  Multi-view Regression Via Canonical Correlation Analysis , 2007, COLT.

[15]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[16]  Tiejian Luo,et al.  Measuring article quality in Wikipedia: Lexical clue model , 2011, 2011 3rd Symposium on Web Society.

[17]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[18]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[19]  J. Voß Measuring Wikipedia , 2005 .

[20]  Yang-hua Chu Trust management for the World Wide Web , 1997 .

[21]  Stefano Mizzaro,et al.  QuWi: quality control in Wikipedia , 2009, WICOW.

[22]  Marsha Ann Tate,et al.  Web Wisdom: How To Evaluate and Create Information Quality on the Web , 1999 .

[23]  Sanford Ressler,et al.  Perspectives on electronic publishing - standards, solutions, and more , 1993 .

[24]  Aaron Krowne,et al.  Building a Digital Library the Commons-based Peer Production Way , 2003, D Lib Mag..

[25]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[26]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[27]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[28]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[29]  Ee-Peng Lim,et al.  Measuring article quality in wikipedia: models and evaluation , 2007, CIKM '07.

[30]  Alex Dekhtyar,et al.  On measuring the quality of Wikipedia articles , 2010, WICOW '10.

[31]  Marios Poulos,et al.  Evaluating authoritative sources using social networks: an insight from Wikipedia , 2006, Online Inf. Rev..

[32]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[33]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .