Discovering Beaten Paths in Collaborative Ontology-Engineering Projects using Markov Chains

Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.

[1]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[2]  Baw-Jhiune Liu,et al.  Identification of hot regions in protein-protein interactions by sequential pattern mining , 2007, BMC Bioinformatics.

[3]  Ed H. Chi,et al.  The singularity is not near: slowing growth of Wikipedia , 2009, Int. Sym. Wikis.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Kevin A. Smith,et al.  The Biomedical Resource Ontology (BRO) to enable resource discovery in clinical and translational research , 2011, J. Biomed. Informatics.

[6]  Sören Auer,et al.  OntoWiki: A Tool for Social, Semantic Collaboration , 2006, CKC.

[7]  LeveneMark,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007 .

[8]  Markus Strohmaier,et al.  Pragmatic Analysis of Crowd-Based Knowledge Production Systems with iCAT Analytics: Visualizing Changes to the ICD-11 Ontology , 2012, AAAI Spring Symposium: Wisdom of the Crowd.

[9]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[10]  Bijan Parsia,et al.  Categorising logical differences between OWL ontologies , 2011, CIKM '11.

[11]  Bijan Parsia,et al.  Facilitating the analysis of ontology differences , 2011 .

[12]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[13]  Markus Strohmaier,et al.  PragmatiX: An Interactive Tool for Visualizing the Creation Process Behind Collaboratively Engineered Ontologies , 2013, Int. J. Semantic Web Inf. Syst..

[14]  Myra Spiliopoulou,et al.  Revised Papers from the International Workshop on Web Usage Analysis and User Profiling , 1999 .

[15]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[16]  Stefanie N. Lindstaedt,et al.  MoKi: The Enterprise Modelling Wiki , 2009, ESWC.

[17]  George Karypis,et al.  Selective Markov Models for Predicting Web-Page Accesses , 2001, SDM.

[19]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[20]  Ian Horrocks,et al.  Just the right amount: extracting modules from ontologies , 2007, WWW '07.

[21]  Ingrid Zukerman,et al.  Predicting users' requests on the WWW , 1999 .

[22]  Denis Helic,et al.  Memory and Structure in Human Navigation Patterns , 2014, ArXiv.

[23]  Denis Helic,et al.  Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order , 2014, PloS one.

[24]  Bijan Parsia,et al.  Analysing the evolution of the NCI Thesaurus , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[25]  George Karypis,et al.  Selective Markov models for predicting Web page accesses , 2004, TOIT.

[26]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[27]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[28]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[29]  Ian Horrocks,et al.  A Logical Framework for Modularity of Ontologies , 2007, IJCAI.

[30]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[31]  Mark Hansen,et al.  Predicting Web Users' Next Access Based on Log Data , 2003 .

[32]  Mark Levene,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007, IEEE Transactions on Knowledge and Data Engineering.

[33]  Darren Gergle,et al.  Hot off the wiki: dynamics, practices, and structures in Wikipedia's coverage of the Tōhoku catastrophes , 2011, Int. Sym. Wikis.

[34]  Csongor Nyulas,et al.  WebProtégé: A collaborative ontology editor and knowledge acquisition tool for the Web , 2013, Semantic Web.

[35]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[36]  Tania Tudorache,et al.  Collaborative Ontology Development on the (Semantic) Web , 2008, AAAI Spring Symposium: Symbiotic Relationships between Semantic Web and Knowledge Engineering.

[37]  Tudor Groza,et al.  State of the art and open challenges in community-driven knowledge curation , 2013, J. Biomed. Informatics.

[38]  Robert Stevens,et al.  Inspecting Regularities in Ontology Design Using Clustering , 2011, SEMWEB.

[39]  Mark A. Musen,et al.  A Framework for Ontology Evolution in Collaborative Environments , 2006, SEMWEB.

[40]  Tania Tudorache,et al.  An analysis of collaborative patterns in large-scale ontology development projects , 2011, K-CAP '11.

[41]  Angel Cabrera,et al.  Knowledge-Sharing Dilemmas , 2002 .

[42]  Denis Helic,et al.  Sequential Usage Patterns in Collaborative Ontology-Engineering Projects , 2014, ArXiv.

[43]  Aniket Kittur,et al.  Harnessing the wisdom of crowds in wikipedia: quality through coordination , 2008, CSCW.

[44]  Csongor Nyulas,et al.  Will Semantic Web Technologies Work for the Development of ICD-11? , 2010, SEMWEB.

[45]  Markus Krötzsch,et al.  Semantic MediaWiki , 2006, Foundations for the Web of Information and Services.

[46]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[47]  Markus Strohmaier,et al.  How ontologies are made: Studying the hidden social dynamics behind collaborative ontology engineering projects , 2013, J. Web Semant..

[48]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[49]  Shlomo Moran,et al.  The stochastic approach for link-structure analysis (SALSA) and the TKC effect , 2000, Comput. Networks.

[50]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[51]  N. Hara,et al.  Beyond vandalism: Wikipedia trolls , 2010, J. Inf. Sci..

[52]  Catia Pesquita,et al.  Predicting the Extension of Biomedical Ontologies , 2012, PLoS Comput. Biol..

[53]  Natalya F. Noy,et al.  WebProtégé : A Distributed Ontology Editor and Knowledge Acquisition Tool for the Web , 2011 .

[54]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[55]  Andreas Blumauer,et al.  PoolParty: SKOS Thesaurus Management Utilizing Linked Data , 2010, ESWC.

[56]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[57]  W. N. Borst,et al.  Construction of Engineering Ontologies for Knowledge Sharing and Reuse , 1997 .

[58]  Hao Wang,et al.  Analysis of User Editing Patterns in Ontology Development Projects , 2013, OTM Conferences.

[59]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[60]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[61]  Aaron Halfaker,et al.  Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work , 2011, Int. Sym. Wikis.