A Large Scale Analysis of Information-Theoretic Network Complexity Measures Using Chemical Structures

This paper aims to investigate information-theoretic network complexity measures which have already been intensely used in mathematical- and medicinal chemistry including drug design. Numerous such measures have been developed so far but many of them lack a meaningful interpretation, e.g., we want to examine which kind of structural information they detect. Therefore, our main contribution is to shed light on the relatedness between some selected information measures for graphs by performing a large scale analysis using chemical networks. Starting from several sets containing real and synthetic chemical structures represented by graphs, we study the relatedness between a classical (partition-based) complexity measure called the topological information content of a graph and some others inferred by a different paradigm leading to partition-independent measures. Moreover, we evaluate the uniqueness of network complexity measures numerically. Generally, a high uniqueness is an important and desirable property when designing novel topological descriptors having the potential to be applied to large chemical databases.

[1]  N. Rashevsky Life, information theory, and topology , 1955 .

[2]  Harold J. Morowitz,et al.  Some order-disorder considerations in living systems , 1955 .

[3]  E. Trucco A note on the information content of graphs , 1956 .

[4]  L. Brillouin,et al.  Science and information theory , 1956 .

[5]  A. Mowshowitz Entropy and the complexity of graphs. II. The information content of digraphs and infinite graphs. , 1968, The Bulletin of mathematical biophysics.

[6]  Abbe Mowshowitz,et al.  Entropy and the complexity of graphs: IV. Entropy measures and graphical structure , 1968 .

[7]  A. Kolmogorov Three approaches to the quantitative definition of information , 1968 .

[8]  A. Mowshowitz,et al.  Entropy and the complexity of graphs. I. An index of the relative complexity of a graph. , 1968, The Bulletin of mathematical biophysics.

[9]  A. Mowshowitz Entropy and the complexity of graphs: III. Graphs with prescribed information content , 1968 .

[10]  Frank Harary,et al.  Graph Theory , 2016 .

[11]  N. Trinajstic,et al.  Information theory, distance matrix, and molecular branching , 1977 .

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  Nenad Trinajstić,et al.  Isomer discrimination by topological information approach , 1981 .

[14]  S C Basak,et al.  Molecular topology and narcosis. A quantitative structure-activity relationship (QSAR) study of alcohols using complementary information content (CIC). , 1983, Arzneimittel-Forschung.

[15]  Hironori Hirata,et al.  Information theoretical analysis of ecological networks , 1984 .

[16]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[17]  Gregory M. Constantine,et al.  Graph complexity and the laplacian matrix in blocked experiments , 1990 .

[18]  A. Balaban,et al.  New vertex invariants and topological indices of chemical graphs based on information on distances , 1991 .

[19]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[20]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[21]  Fred Sobik,et al.  Operations on cognitive structures — their modeling on the basis of graph theory , 1994 .

[22]  M. Diudea,et al.  Molecular topology , 1995 .

[23]  Victor Chepoi,et al.  The Wiener Index and the Szeged Index of Benzenoid Systems in Linear Time , 1997, J. Chem. Inf. Comput. Sci..

[24]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[25]  A. Balaban,et al.  Topological Indices and Related Descriptors in QSAR and QSPR , 2003 .

[26]  G. Whitesides,et al.  Complexity in chemistry. , 1999, Science.

[27]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[28]  Information Theoretic Indices of Neighborhood Complexity and their Applications , 2000 .

[29]  A. T. Balaban and O. Ivanciuc,et al.  Historical Development of Topological Indices , 2000 .

[30]  Danail Bonchev,et al.  Overall Connectivities/Topological Complexities: A New Powerful Tool for QSPR/QSAR , 2000, J. Chem. Inf. Comput. Sci..

[31]  Lada A. Adamic,et al.  Power-Law Distribution of the World Wide Web , 2000, Science.

[32]  Xiaoyi Gao,et al.  Human population structure detection via multilocus genotype clustering , 2007, BMC Genetics.

[33]  Robert E. Ulanowicz,et al.  Information Theory in Ecology , 2001, Comput. Chem..

[34]  Rainer Brüggemann,et al.  Information Theoretic Measures for the Maturity of Ecosystems , 2001 .

[35]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[36]  Jie Wu,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2003 .

[37]  M. Randic,et al.  On the Concept of Molecular Complexity , 2002 .

[38]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[39]  Elena V. Konstantinova,et al.  Applications of information theory in chemical graph theory , 2003 .

[40]  Danail Bonchev,et al.  Complexity in chemistry : introduction and fundamentals , 2003 .

[41]  Stefan Bornholdt,et al.  Handbook of Graphs and Networks: From the Genome to the Internet , 2003 .

[42]  J. Gasteiger,et al.  Chemoinformatics: A Textbook , 2003 .

[43]  Sergey N. Dorogovtsev,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW (Physics) , 2003 .

[44]  Stefan Richter,et al.  Centrality Indices , 2004, Network Analysis.

[45]  R. Solé,et al.  Information Theory of Complex Networks: On Evolution and Architectural Constraints , 2004 .

[46]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[47]  Albert-László Barabási,et al.  Evolution of Networks: From Biological Nets to the Internet and WWW , 2004 .

[48]  L. da F. Costa,et al.  Characterization of complex networks: A survey of measurements , 2005, cond-mat/0505185.

[49]  D. Bonchev,et al.  Complexity in chemistry, biology, and ecology , 2005 .

[50]  Elena V. Konstantinova,et al.  On Some Applications of Information Indices in Chemical Graph Theory , 2006, GTIT-C.

[51]  Stasys Jukna,et al.  On Graph Complexity , 2006, Combinatorics, Probability and Computing.

[52]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[54]  Alexander Mehler,et al.  In Search of a Bridge Between Network Analysis in Computational Linguistics and Computational Biology - A Conceptual Note , 2006, BIOCOMP.

[55]  D. Jayasri,et al.  Phase transition in liquid crystal elastomers—A Monte Carlo study employing non-Boltzmann sampling , 2006 .

[56]  Matthias Dehmer,et al.  Information theoretic measures of UHG graphs with low computational complexity , 2007, Appl. Math. Comput..

[57]  J. Hollunder,et al.  Information theoretic description of networks , 2007 .

[58]  Luciano da Fontoura Costa,et al.  Seeking for simplicity in complex networks , 2007, physics/0702102.

[59]  O Mason,et al.  Graph theory and networks in Biology. , 2006, IET systems biology.

[60]  Jens Christian Claussen,et al.  Offdiagonal complexity: A computationally quick complexity measure for graphs and networks , 2004, q-bio/0410024.

[61]  Matthias Dehmer,et al.  Structural information content of networks: Graph entropy based on local vertex functionals , 2008, Comput. Biol. Chem..

[62]  M. Dehmer,et al.  Analysis of Microarray Data: A Network-Based Approach , 2008 .

[63]  Matthias Dehmer,et al.  A NOVEL METHOD FOR MEASURING THE STRUCTURAL INFORMATION CONTENT OF NETWORKS , 2008, Cybern. Syst..

[64]  M. Dehmer,et al.  Comprar Analysis of Microarray Data: A Network-Based Approach | Matthias Dehmer | 9783527318223 | Wiley , 2008 .

[65]  Paul M. B. Vitányi,et al.  An Introduction to Kolmogorov Complexity and Its Applications, Third Edition , 1997, Texts in Computer Science.

[66]  Matthias Dehmer,et al.  Information processing in the transcriptional regulatory network of yeast: Functional robustness , 2009, BMC Systems Biology.

[67]  Klaus-Robert Müller,et al.  A Probabilistic Approach to Classifying Metabolic Stability , 2008, J. Chem. Inf. Model..

[68]  Mitchell A. Avery,et al.  Structure-based virtual screening against SARS-3CLpro to identify novel non-peptidic hits , 2008, Bioorganic & Medicinal Chemistry.

[69]  Chris Morley,et al.  Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit , 2008, Chemistry Central journal.

[70]  Thomas Wilhelm,et al.  What is a complex graph , 2008 .

[71]  Abbe Mowshowitz,et al.  Entropy, Orbits, and Spectra of Graphs , 2009 .

[72]  Matthias Dehmer,et al.  Towards Network Complexity , 2009, Complex.

[73]  Stephan Borgert,et al.  On Entropy-Based Molecular Descriptors: Statistical Analysis of Real and Synthetic Chemical Structures , 2009, J. Chem. Inf. Model..

[74]  Lucas Antiqueira,et al.  Characterization of subgraph relationships and distribution in complex networks , 2008, 0807.2875.

[75]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[76]  Luciano da Fontoura Costa,et al.  Beyond the average: Detecting global singular nodes from local features in complex networks , 2006, 1003.3084.

[77]  Lucia Caporaso Compactified Jacobians of N\'eron type , 2010 .

[78]  Calvin H. Bartholomew,et al.  Introduction and Fundamentals , 2010 .

[79]  Alexander Mehler A Quantitative Graph Model of Social Ontologies by Example of Wikipedia , 2011 .

[80]  A.V.Bakshi,et al.  Network Analysis , 2005, Operations Research.