Assessing Information Quality of a Community-Based Encyclopedia

Effective information quality analysis needs powerful yet easy ways to obtain metrics. The English version of Wikipedia provides an extremely interesting yet challenging case for the study of Information Quality dynamics at both macro and micro levels. We propose seven IQ metrics which can be evaluated automatically and test the set on a representative sample of Wikipedia content. The methodology of the metrics construction and the results of tests, along with a number of statistical characterizations of Wikipedia articles, their content construction, process metadata and social context are reported.

[1]  Susan Gauch,et al.  Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web , 2000, SIGIR '00.

[2]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[3]  Susan C. Herring,et al.  Collaborative Authoring on the Web: A Genre Analysis of Online Encyclopedias , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[4]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[5]  David M. Nichols,et al.  The Usability of Open Source Software , 2003, First Monday.

[6]  Robert Collison,et al.  "Encyclopaedias: their history throughout the ages. A bibliographical guide with extensive historical notes to the general encyclopaedias issued throughout the world from 350 B. C. to the present day", Robert Collison, New York-London 1964 : [recenzja] / J. Strz. , 1970 .

[7]  Tom McArthur,et al.  Worlds of reference : lexicography, learning, and language from the clay tablet to the computer , 1986 .

[8]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[9]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[10]  Richard Y. Wang,et al.  Modeling Information Manufacturing Systems to Determine Information Product Quality Management Scien , 1998 .

[11]  Stuart E. Madnick,et al.  An Information Product Approach for Total Information Awareness , 2002 .

[12]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[13]  Hawoong Jeong,et al.  Classification of scale-free networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Linda C. Smith,et al.  Reference and information services : an introduction , 1995 .

[16]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[17]  Ken Orr,et al.  Data quality and systems theory , 1998, CACM.

[18]  Amihai Motro,et al.  Estimating the Quality of Databases , 1998, FQAS.

[19]  Jean-Marc Dewaele,et al.  Variation in the Contextuality of Language: An Empirical Measure , 2002 .

[20]  R. Gunning The Technique of Clear Writing. , 1968 .

[21]  Jane Greenberg,et al.  Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming , 2003, J. Assoc. Inf. Sci. Technol..

[22]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[23]  Andreas Neus,et al.  Managing Information Quality in Virtual Communities of Practice , 2001, IQ.

[24]  Linda C. Smith,et al.  INFORMATION QUALITY IN A COMMUNITY-BASED ENCYCLOPEDIA , 2005 .

[25]  Robert Collison,et al.  Encyclopaedias: Their History Throughout the Ages , 1966 .