Relations frequency hypermatrices in mutual, conditional, and joint entropy‐based information indices

Graph‐theoretic matrix representations constitute the most popular and significant source of topological molecular descriptors (MDs). Recently, we have introduced a novel matrix representation, named the duplex relations frequency matrix, F, derived from the generalization of an incidence matrix whose row entries are connected subgraphs of a given molecular graph G. Using this matrix, a series of information indices (IFIs) were proposed. In this report, an extension of F is presented, introducing for the first time the concept of a hypermatrix in graph‐theoretic chemistry. The hypermatrix representation explores the n‐tuple participation frequencies of vertices in a set of connected subgraphs of G. In this study we, however, focus on triple and quadruple participation frequencies, generating triple and quadruple relations frequency matrices, respectively. The introduction of hypermatrices allows us to redefine the recently proposed MDs, that is, the mutual, conditional, and joint entropy‐based IFIs, in a generalized way. These IFIs are implemented in GT‐STAF (acronym for Graph Theoretical Thermodynamic STAte Functions), a new module of the TOMOCOMD‐CARDD program. Information theoretic‐based variability analysis of the proposed IFIs suggests that the use of hypermatrices enhances the entropy and, hence, the variability of the previously proposed IFIs, especially the conditional and mutual entropy based IFIs. The predictive capacity of the proposed IFIs was evaluated by the analysis of the regression models, obtained for physico‐chemical properties the partition coefficient (Log P) and the specific rate constant (Log K) of 34 derivatives of 2‐furylethylene. The statistical parameters, for the best models obtained for these properties, were compared to those reported in the literature depicting better performance. This result suggests that the use of the hypermatrix‐based approach, in the redefinition of the previously proposed IFIs, avails yet other valuable tools beneficial in QSPR studies and diversity analysis. © 2012 Wiley Periodicals, Inc.

[1]  Anton J. Hopfinger,et al.  Application of Genetic Function Approximation to Quantitative Structure-Activity Relationships and Quantitative Structure-Property Relationships , 1994, J. Chem. Inf. Comput. Sci..

[2]  A. Balaban,et al.  Topological Indices and Related Descriptors in QSAR and QSPR , 2003 .

[3]  Ekaterina Gordeeva,et al.  Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research , 1993, J. Chem. Inf. Comput. Sci..

[4]  Milan Randić,et al.  Generalized molecular descriptors , 1991 .

[5]  Francisco Torrens,et al.  Bond-based linear indices of the non-stochastic and stochastic edge-adjacency matrix. 1. Theory and modeling of ChemPhys properties of organic molecules , 2010, Molecular Diversity.

[6]  E Estrada,et al.  Novel local (fragment-based) topological molecular descriptors for QSpr/QSAR and molecular design. , 2001, Journal of molecular graphics & modelling.

[7]  Ovidiu Ivanciuc,et al.  Molecular Graph Matrices and Derived Structural Descriptors , 1997 .

[8]  R. Todeschini,et al.  Molecular Descriptors for Chemoinformatics: Volume I: Alphabetical Listing / Volume II: Appendices, References , 2009 .

[9]  Cyrus D. Cantrell,et al.  Modern Mathematical Methods for Physicists and Engineers , 2000 .

[10]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[11]  Jürgen Bajorath,et al.  Variability of Molecular Descriptors in Compound Databases Revealed by Shannon Entropy Calculations , 2000, J. Chem. Inf. Comput. Sci..

[12]  D. Marinescu,et al.  Classical and Quantum Information , 2012 .

[13]  P. Willett Genetic algorithms in molecular recognition and design. , 1995, Trends in biotechnology.

[14]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[15]  E Estrada On the Topological Sub-Structural Molecular Design (TOSS-MODE) in QSPR/QSAR and Drug Design Research , 2000, SAR and QSAR in environmental research.

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Enrique Molina,et al.  3D connectivity indices in QSPR/QSAR studies. , 2001 .

[18]  Ernesto Estrada,et al.  Spectral Moments of the Edge Adjacency Matrix in Molecular Graphs, 1. Definition and Applications to the Prediction of Physical Properties of Alkanes , 1996, J. Chem. Inf. Comput. Sci..

[19]  Ernesto Estrada,et al.  Edge-Connectivity Indices in QSPR/QSAR Studies, 2. Accounting for Long-Range Bond Contributions , 1999, J. Chem. Inf. Comput. Sci..

[20]  M. Bals [INFORMATION THEORY IN BIOLOGY]. , 1963, Studii si cercetari de inframicrobiologie.

[21]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[22]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[23]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[24]  Jürgen Bajorath,et al.  Chemical Descriptors with Distinct Levels of Information Content and Varying Sensitivity to Differences between Selected Compound Databases Identified by SE-DSE Analysis , 2002, J. Chem. Inf. Comput. Sci..

[25]  V. A. Gorbatov Fundamentos de la matemática discreta , 1988 .

[26]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[27]  Roberto Todeschini,et al.  Molecular descriptors for chemoinformatics , 2009 .

[28]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .