Measuring article quality in wikipedia: models and evaluation

Wikipedia has grown to be the world largest and busiest free encyclopedia, in which articles are collaboratively written and maintained by volunteers online. Despite its success as a means of knowledge sharing and collaboration, the public has never stopped criticizing the quality of Wikipedia articles edited by non-experts and inexperienced contributors. In this paper, we investigate the problem of assessing the quality of articles in collaborative authoring of Wikipedia. We propose three article quality measurement models that make use of the interaction data between articles and their contributors derived from the article edit history. Our B<scp>asic</scp> model is designed based on the mutual dependency between article quality and their author authority. The P<scp>eer</scp>R<scp>eview</scp> model introduces the review behavior into measuring article quality. Finally, our P<scp>rob</scp>R<scp>eview</scp> models extend P<scp>eer</scp>R<scp>eview</scp> with partial reviewership of contributors as they edit various portions of the articles. We conduct experiments on a set of well-labeled Wikipedia articles to evaluate the effectiveness of our quality measurement models in resembling human judgement.

[1]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[2]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[3]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[4]  Junghoo Cho,et al.  On the Evolution of Wikipedia , 2007, ICWSM.

[5]  Panayiotis Tsaparas,et al.  Using non-linear dynamical systems for web searching and ranking , 2004, PODS.

[6]  Péter Schönhofen Identifying document topics using the Wikipedia category network , 2009, Web Intell. Agent Syst..

[7]  J. Voß Measuring Wikipedia , 2005 .

[8]  Gene H. Golub,et al.  Matrix computations , 1983 .

[9]  M. de Rijke,et al.  Discovering missing links in Wikipedia , 2005, LinkKDD '05.

[10]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[11]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[12]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[13]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[14]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[15]  Hector Garcia-Molina,et al.  Link spam detection based on mass estimation , 2006, VLDB.

[16]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[17]  Sean W. Smith,et al.  Quality in Internet Collective Goods : Zealots and Good Samaritans in the Case of Wikipedia , 2005 .

[18]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[19]  Tom Cross,et al.  Puppy smoothies: Improving the reliability of open, collaborative wikis , 2006, First Monday.

[20]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[21]  Deborah L. McGuinness,et al.  Computing trust from revision history , 2006, PST.

[22]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[23]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[24]  Ee-Peng Lim,et al.  Measuring Qualities of Articles Contributed by Online Communities , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[25]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .