Statistical mechanics of ontology based annotations

We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotated increases. In doing so we provide a further possible measure for assessment of ontologies.

[1]  Lik Mui,et al.  An Information Theoretic Approach for Ontology-based Interest Matching , 2001, Workshop on Ontology Learning.

[2]  Don Zagier,et al.  The dilogarithm function. , 2007 .

[3]  Wolfgang G. Stock,et al.  Folksonomy and information retrieval , 2008, ASIST.

[4]  Robert Stevens,et al.  The language of gene ontology: a Zipf’s law analysis , 2012, BMC Bioinformatics.

[5]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[6]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  D. Kendall Branching Processes Since 1873 , 1966 .

[8]  B. Hammond Ontology , 2004, Lawrence Booth’s Book of Visions.

[9]  Jacques Calmet,et al.  From entropy to ontology , 2004 .

[10]  Carol Friedman,et al.  Information theory applied to the sparse gene ontology annotation network to predict novel gene function , 2007, ISMB/ECCB.

[11]  G. Alterovitz,et al.  An Information Theoretic Framework for Ontology-based Bioinformatics , 2007, 2007 Information Theory and Applications Workshop.

[12]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[13]  F. Galton,et al.  On the Probability of the Extinction of Families , 1875 .

[14]  Edmund Taylor Whittaker,et al.  A Course of Modern Analysis , 2021 .

[15]  Andrey Rzhetsky,et al.  Benchmarking Ontologies: Bigger or Better? , 2011, PLoS Comput. Biol..

[16]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[17]  R. Ferrer i Cancho,et al.  Zipf's law from a communicative phase transition , 2005 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  R. F. Cancho,et al.  The global minima of the communicative energy of natural communication systems , 2007 .

[20]  Rajka Bućin,et al.  Taylor, A.G. The Organization of Information. Westport, Connecticut, London : Libraries Unlimited, 2004. , 2008 .

[21]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[22]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[23]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[24]  Ed H. Chi,et al.  Understanding the efficiency of social tagging systems using information theory , 2008, ICWSM.

[25]  G. Zipf The Psycho-Biology Of Language: AN INTRODUCTION TO DYNAMIC PHILOLOGY , 1999 .

[26]  Gregory Gutin,et al.  Digraphs - theory, algorithms and applications , 2002 .

[27]  Tamás Vicsek,et al.  Ontologies and tag-statistics , 2012, ArXiv.

[28]  Anatol N. Kirillov Dilogarithm identities , 1994 .

[29]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[30]  Franziska Klügl From agent theory to agent implementation , 2001 .

[31]  Gergely Palla,et al.  Fundamental statistical features and self-similar properties of tagged networks , 2008, 0812.4236.

[32]  Arlene G. Taylor,et al.  The Organization of Information , 1999 .

[33]  K. Kosmidis,et al.  Statistical mechanical approach to human language , 2005, physics/0510019.