On improving wikipedia search using article quality

Wikipedia is presently the largest free-and-open online encyclopedia collaboratively edited and maintained by volunteers. While Wikipedia offers full-text search to its users, the accuracy of its relevance-based search can be compromised by poor quality articles edited by non-experts and inexperienced contributors. In this paper, we propose a framework that re-ranks Wikipedia search results considering article quality. We develop two quality measurement models, namely Basic and P<scp>eer</scp>R<scp>eview</scp>, to derive article quality based on co-authoring data gathered from articles' edit history. Compared withWikipedia's full-text search engine, Google and Wikiseek, our experimental results showed that (i) quality-only ranking produced by P<scp>eer</scp>R<scp>eview</scp> gives comparable performance to that of Wikipedia and Wikiseek; (ii) P<scp>eer</scp>R<scp>eview</scp> combined with relevance ranking outperforms Wikipedia's full-text search significantly, delivering search accuracy comparable to Google.

[1]  Kwok-wai Joseph Lee,et al.  Information retrieval on the world wide web , 2001 .

[2]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[3]  Giles,et al.  Searching the world wide Web , 1998, Science.

[4]  Thomas Mandl,et al.  Implementation and evaluation of a quality-based search engine , 2006, HYPERTEXT '06.

[5]  Ee-Peng Lim,et al.  Measuring Qualities of Articles Contributed by Online Communities , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[6]  Loren G. Terveen,et al.  Does “authority” mean quality? predicting expert quality ratings of Web documents , 2000, SIGIR '00.

[7]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[8]  Deborah L. McGuinness,et al.  Computing trust from revision history , 2006, PST.

[9]  Susan Gauch,et al.  Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web , 2000, SIGIR '00.

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[12]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[13]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[14]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[15]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[16]  Gene H. Golub,et al.  Matrix computations , 1983 .

[17]  Sean W. Smith,et al.  Quality in Internet Collective Goods : Zealots and Good Samaritans in the Case of Wikipedia , 2005 .

[18]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[19]  W. Bruce Croft,et al.  Document quality models for web ad hoc retrieval , 2005, CIKM '05.

[20]  Panayiotis Tsaparas,et al.  Using non-linear dynamical systems for web searching and ranking , 2004, PODS.