Determining structural similarity of chemicals using graph-theoretic indices

Abstract Ninety (90) graph-theoretic indices were calculated for a diverse set of 3692 chemicals to test the efficacy of using graph-theoretic indices in determining similarity of chemicals in a large, diverse data base of structures. Principal component analysis was used to reduce the 90-dimensional space to a 10-dimensional subspace which explains 93% of the variance. Distance between chemicals in this 10-dimensional space was used to measure similarity. To test this approach, ten chemicals were chosen at random from the set of 3692 chemicals and the five nearest neighbors for each of these ten target chemicals were determined. The results show that this measure of similarity reflects intuitive notions of chemical similarity.

[1]  Louis V. Quintas,et al.  Extremal f-trees and embedding spaces for molecular graphs , 1983, Discret. Appl. Math..

[2]  M. Tatsuoka Multivariate Analysis Techniques for Educational and Psychological Research , 1971 .

[3]  Louis V. Quintas,et al.  THE NUMBER OF CHIRAL ALKANES HAVING GIVEN DIAMETER AND CARBON AUTOMORPHISM GROUP, A SYMMETRIC GROUP , 1979 .

[4]  L. Hall,et al.  Molecular connectivity in chemistry and drug research , 1976 .

[5]  S C Basak,et al.  Comparative study of lipophilicity versus topological molecular descriptors in biological correlations. , 1984, Journal of pharmaceutical sciences.

[6]  Subhash C. Basak,et al.  NEIGHBORHOOD COMPLEXITIES AND SYMMETRY OF CHEMICAL GRAPHS AND THEIR BIOLOGICAL APPLICATIONS , 1984 .

[7]  R. B. King,et al.  Chemical applications of topology and group theory , 1984 .

[8]  Ivan Gutman,et al.  On the calculation of the acyclic polynomial , 1978 .

[9]  M. Randic Conjugated circuits and resonance energies of benzenoid hydrocarbons , 1976 .

[10]  E. Trucco,et al.  On the information content of graphs: Compound symbols; Different states for each point , 1956 .

[11]  V. R. Magnuson,et al.  Topological indices: their nature, mutual relatedness, and applications , 1987 .

[12]  S C Basak,et al.  Molecular connectivity and antifungal activity. A quantitative structure-activity relationship study of substituted phenols against skin pathogens. , 1982, Arzneimittel-Forschung.

[13]  Ronald C. Read A new system for the designation of chemical compounds. 2. Coding of cyclic compounds , 1985, J. Chem. Inf. Comput. Sci..

[14]  E. Trucco A note on the information content of graphs , 1956 .

[15]  George A. Baker,et al.  Drum Shapes and Isospectral Graphs , 1966 .

[16]  A. B. Roy,et al.  Topological information content of genetic molecules—I. , 1978 .

[17]  Danail Bonchev,et al.  Information theoretic indices for characterization of chemical structures , 1983 .

[18]  George Karreman Topological information content and chemical reactions , 1955 .

[19]  M. Gordon,et al.  The structure and properties of molecular trees and networks , 1975 .

[20]  S C Basak,et al.  Molecular topology and narcosis. A quantitative structure-activity relationship (QSAR) study of alcohols using complementary information content (CIC). , 1983, Arzneimittel-Forschung.

[21]  Subhash C. Basak,et al.  A quantitative structure activity relationship (QSAR) analysis of carbomoyl piperidines, barbiturates and alkanes using information – theoretic topological indices-1 , 1981 .

[22]  Subhash C. Basak,et al.  Molecular topology and pharmacological action: A QSAR study of tetrazoles using topological information content (IC) , 1982 .

[23]  P. Jurs,et al.  Computer-assisted structure-activity studies of chemical carcinogens: a polycyclic aromatic hydrocarbon data set. , 1980, Toxicology and Applied Pharmacology.

[24]  S C Basak,et al.  A quantitative structure-activity relationship study of N-alkylnorketobemidones and triazinones using structural information content. , 1982, Arzneimittel-Forschung.

[25]  N. Rashevsky Life, information theory, and topology , 1955 .

[26]  Leonard Spialter,et al.  The Atom Connectivity Matrix Characteristic Polynomial (ACMCP) and Its Physico-Geometeric (Topological) Significance. , 1964 .

[27]  A. Balaban Chemical applications of graph theory , 1976 .

[28]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[29]  S C Basak,et al.  Physicochemical and topological correlates of the enzymatic acetyltransfer reaction. , 1983, Journal of pharmaceutical sciences.

[30]  Frank Harary,et al.  Graph Theory , 2016 .

[31]  Milan Randić,et al.  A graph theoretical approach to structure-property and structure-activity correlations , 1980 .

[32]  Danail Bonchev,et al.  Graph—theoretical approach to the calculation of physico-chemical properties of polymers , 1983 .

[33]  Derek G. Corneil,et al.  The graph isomorphism disease , 1977, J. Graph Theory.

[34]  A. Mowshowitz Entropy and the complexity of graphs. II. The information content of digraphs and infinite graphs. , 1968, The Bulletin of mathematical biophysics.

[35]  Ronald C. Read A new system for the designation of chemical compounds. 1. Theoretical preliminaries and the coding of acyclic compounds , 1983, J. Chem. Inf. Comput. Sci..

[36]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[37]  M. Randic Characterization of molecular branching , 1975 .

[38]  Gert Sabidussi,et al.  Graphs with Given Group and Given Graph-Theoretical Properties , 1957, Canadian Journal of Mathematics.

[39]  A. Mowshowitz,et al.  Entropy and the complexity of graphs. I. An index of the relative complexity of a graph. , 1968, The Bulletin of mathematical biophysics.

[40]  Steven H. Bertz,et al.  On the complexity of graphs and molecules , 1983 .

[41]  Gerald J. Niemi,et al.  APPLICATIONS OF MOLECULAR CONNECTIVITY INDICES AND MULTIVARIATE ANALYSIS IN ENVIRONMENTAL CHEMISTRY. , 1984 .

[42]  A. J. Stuper,et al.  Computer assisted studies of chemical structure and biological function , 1979 .

[43]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[44]  N. Trinajstic,et al.  Information theory, distance matrix, and molecular branching , 1977 .

[45]  Leonard Spialter The Atom Connectivity Matrix (ACM) and its Characteristic Polynomial (ACMCP): A New Computer-Oriented Chemical Nomenclature , 1963 .

[46]  A. Mowshowitz Entropy and the complexity of graphs: III. Graphs with prescribed information content , 1968 .

[47]  Peter J. Slater Counterexamples to randić's conjecture on distance degree sequences for trees , 1982, J. Graph Theory.

[48]  Leonard Spialter The Atom Connectivity Matrix (ACM) and Its Characteristic Polynomial (ACMCP). , 1964 .

[49]  Milan Randic,et al.  Symmetry properties of graphs of interest in chemistry. II. Desargues–Levi graph , 1979 .

[50]  Kurt Varmuza,et al.  Pattern recognition in chemistry , 1980 .

[51]  Frank Harary,et al.  The Characteristic Polyomial Does Not Uniquely Determine the Topology of a Molecule , 1971 .

[52]  L B Kier,et al.  Use of molecular negentropy to encode structure governing biological activity. , 1980, Journal of pharmaceutical sciences.

[53]  Steven H. Bertz,et al.  Convergence, molecular complexity, and synthetic analysis , 1982 .

[54]  Abbe Mowshowitz,et al.  Entropy and the complexity of graphs: IV. Entropy measures and graphical structure , 1968 .

[55]  Milan Randic,et al.  On molecular identification numbers , 1984, J. Chem. Inf. Comput. Sci..

[56]  Frank Harary,et al.  Cospectral Graphs and Digraphs , 1971 .

[57]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[58]  J. W. Kennedy,et al.  The graph-like state of matter. Part 2.—LCGI schemes for the thermodynamics of alkanes and the theory of inductive inference , 1973 .

[59]  Subhash C. Basak,et al.  A quantitative structure activity relationship study of tumor inhibitory triazenes using bonding information content and lipophilicity , 1982 .

[60]  Subhash C. Basak,et al.  A quantitative correlation of the LC50 values of esters in pimephales promelas using physicochemical and topological parameters , 1984 .

[61]  K. Humbel,et al.  Chemical Applications of Topology and Graph Theory, R.B. King (Ed.). Elsevier Science Publishers, Amsterdam (1983), (ISBN 0-444-42244-7). XII + 494 p. Price Dfl. 275.00 , 1985 .

[62]  O. Mekenyan,et al.  Comparability graphs and electronic spectra of condensed benzenoid hydrocarbons , 1983 .

[63]  K Enslein A toxicity estimation model. , 1978, Journal of environmental pathology and toxicology.

[64]  S C Basak,et al.  The utility of information content, structural information content, hydrophobicity and van der Waals volume in the design of barbiturates and tumor inhibitory triazenes. A comparative study. , 1983, Arzneimittel-Forschung.

[65]  L. Brillouin,et al.  Science and information theory , 1956 .