Who Models the World?

Wikidata is a collaborative knowledge graph which is central to many academic and industry IT projects. Its users are responsible for maintaining the schema that organises this knowledge into classes, properties, and attributes, which together form the Wikidata 'ontology'. In this paper, we study the relationship between different Wikidata user roles and the quality of the Wikidata ontology. To do so we first propose a framework to evaluate the ontology as it evolves. We then cluster editing activities to identify user roles in monthly time frames. Finally, we explore how each role impacts the ontology. Our analysis shows that the Wikidata ontology has uneven breadth and depth. We identified two user roles: contributors and leaders. The second category is positively associated to ontology depth, with no significant effect on other features. Further work should investigate other dimensions to define user profiles and their influence on the knowledge graph.

[1]  Oded Nov,et al.  Functional Roles and Career Paths in Wikipedia , 2015, CSCW.

[2]  B. Shneiderman,et al.  The Reader-to-Leader Framework: Motivating Technology-Mediated Social Participation , 2009 .

[3]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[4]  Aldo Gangemi,et al.  Modelling Ontology Evaluation and Validation , 2006, ESWC.

[5]  Letha H. Etzkorn,et al.  Coupling metrics for ontology-based system , 2006, IEEE Software.

[6]  Linus Dahlander,et al.  Progressing to the Center: Coordinating Project Work , 2011, Organ. Sci..

[7]  Deborah Stacey,et al.  Approaches , methods , metrics , measures , and subjectivity in ontology evaluation : A survey , 2014 .

[8]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[9]  Ismailcem Budak Arpinar,et al.  Ontology Evaluation and Ranking using OntoQA , 2007, International Conference on Semantic Computing (ICSC 2007).

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[12]  Michael Günther,et al.  Introducing Wikidata to the Linked Data Web , 2014, SEMWEB.

[13]  Zhe Yang,et al.  Evaluation Metrics for Ontology Complexity and Evolution Analysis , 2006, 2006 IEEE International Conference on e-Business Engineering (ICEBE'06).

[14]  Thomas Steiner,et al.  Bots vs. Wikipedians, Anons vs. Logged-Ins (Redux): A Global Study of Edit Activity on Wikipedia and Wikidata , 2014, OpenSym.

[15]  Elena Simperl,et al.  What Makes a Good Collaborative Knowledge Graph: Group Composition and Quality in Wikidata , 2017, SocInfo.

[16]  P. Resnick,et al.  Building Successful Online Communities: Evidence-Based Social Design , 2012 .

[17]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[18]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[19]  Marko Grobelnik,et al.  A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES , 2005 .

[20]  Elena García Barriocanal,et al.  Empirical findings on ontology metrics , 2012, Expert Syst. Appl..

[21]  Sudha Ram,et al.  Who does what: Collaboration patterns in the wikipedia and their impact on article quality , 2011, TMIS.

[22]  Darren Gergle,et al.  The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context , 2010, CHI.

[23]  Claudia Müller-Birn,et al.  Peer-production system or collaborative ontology engineering effort: what is Wikidata? , 2015, OpenSym.

[24]  Markus Strohmaier,et al.  Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains , 2014, J. Biomed. Informatics.

[25]  Danyel Fisher,et al.  Visualizing the Signatures of Social Roles in Online Discussion Groups , 2007, J. Soc. Struct..

[26]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[27]  Asunción Gómez-Pérez,et al.  Ontology Evaluation , 2004, Handbook on Ontologies.

[28]  Francisco J. García-Peñalvo,et al.  A Survey on Ontology Metrics , 2010, WSKS.

[29]  Enrico Motta,et al.  What Makes a Good Ontology? A Case-Study in Fine-Grained Knowledge Reuse , 2009, ASWC.

[30]  Robert Stevens,et al.  Constructing Conceptual Knowledge Artefacts: Activity Patterns in the Ontology Authoring Process , 2015, CHI.

[31]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[32]  Tania Tudorache,et al.  An analysis of collaborative patterns in large-scale ontology development projects , 2011, K-CAP '11.

[33]  Elena Paslaru Bontas Simperl,et al.  Wikidatians are Born: Paths to Full Participation in a Collaborative Structured Knowledge Base , 2017, HICSS.

[34]  Martin Doerr,et al.  The CIDOC Conceptual Reference Module: An Ontological Approach to Semantic Interoperability of Metadata , 2003, AI Mag..

[35]  Giancarlo Guizzardi,et al.  Applying a Multi-Level Modeling Theory to Assess Taxonomic Hierarchies in Wikidata , 2016, WWW.

[36]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[37]  Beth A. Bechky,et al.  The Emergence of Governance in an Open Source Community , 2007 .

[38]  K. Kuutti Activity theory as a potential framework for human-computer interaction research , 1995 .

[39]  Aniket Kittur,et al.  Harnessing the wisdom of crowds in wikipedia: quality through coordination , 2008, CSCW.

[40]  Gianluca Demartini,et al.  The Evolution of Power and Standard Wikidata Editors: Comparing Editing Behavior over Time to Predict Lifespan and Volume of Edits , 2018, Computer Supported Cooperative Work (CSCW).

[41]  York Sure-Vetter,et al.  Automatic Evaluation of Ontologies (AEON) , 2005, SEMWEB.

[42]  Birger Lantow,et al.  OntoMetrics: Application of On-line Ontology Metric Calculation , 2016, BIR Workshops.

[43]  James A. Thom,et al.  Ontology evaluation using wikipedia categories for browsing , 2007, CIKM '07.

[44]  Bryan A. Pendleton,et al.  Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie , 2006 .

[45]  Dan Cosley,et al.  Averaging Gone Wrong: Using Time-Aware Analyses to Better Understand Behavior , 2016, WWW.

[46]  Letha H. Etzkorn,et al.  Cohesion Metrics for Ontology Design and Application , 2005 .

[47]  Luca de Alfaro,et al.  A content-driven reputation system for the wikipedia , 2007, WWW '07.

[48]  Asunción Gómez-Pérez,et al.  ONTOMETRIC: A Method to Choose the Appropriate Ontology , 2004, J. Database Manag..

[49]  Loren G. Terveen,et al.  Freedom versus Standardization: Structured Data Generation in a Peer Production Community , 2017, CHI.

[50]  Jong Wook Kim,et al.  CDIP: Collection-Driven, yet Individuality-Preserving Automated Blog Tagging , 2007 .

[51]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.