Evaluation of the molecular similarity and property prediction for QSAR purposes

Abstract A string comparison method has been developed and applied to the measurement of the molecular similarity of chemical structures. The molecular structures were encoded as sequences of numbers representing counts of paths of different lengths. The similarity index between two compounds was calculated as the difference between the gains of information derived through a comparison of the corresponding molecular path sequences. Strings representing ordering of compounds according to their similarity were used for clustering of the elements of the data set studied. The classification of an unknown object into one of the clusters obtained and the properties associated with the cluster were used as a source for prediction of some molecular properties. The method is illustrated on two groups of compounds, barbiturates and benzamidines. The algorithms and the programs used are described briefly.

[1]  F. Harary,et al.  Chemical graphs—V : Enumeration and proposed nomenclature of benzenoid cata-condensed polycyclic aromatic hydrocarbons , 1968 .

[2]  Brian W. Kernighan,et al.  Document preparation , 1986 .

[3]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[4]  A. Cammarata,et al.  Pattern recognition. Classification of therapeutic agents according to pharmacophores. , 1976, Journal of medicinal chemistry.

[5]  Alexandru T. Balaban,et al.  Applications of graph theory in chemistry , 1985, J. Chem. Inf. Comput. Sci..

[6]  C. Hansch,et al.  The structure-activity relationship in barbiturates and its similarity to that in other narcotics. , 1967, Journal of medicinal chemistry.

[7]  E Reiner,et al.  Botulism: a pyrolysis-gas-liquid chromatographic study. , 1978, Journal of chromatographic science.

[8]  M. Randic,et al.  Comparison of sequences as a method for evaluation of the molecular similarity , 1986, Journal of computational chemistry.

[9]  Milan Randic,et al.  Nonempirical approach to structure–activity studies† , 1984 .

[10]  K. Chu Applications of artificial intelligence to chemistry. Use of pattern recognition and cluster analysis to determine the pharmacological activity of some organic compounds. , 1974, Analytical chemistry.

[11]  George W. Adamson,et al.  A Comparison of the Performance of Some Similarity and Dissimilarity Measures in the Automatic Classification of Chemical Structures , 1975, J. Chem. Inf. Comput. Sci..

[12]  Milan Randic,et al.  Search for all self-avoiding paths graphs for molecular graphs , 1979, Comput. Chem..

[13]  C Hansch,et al.  Structure-activity relationships in immunochemistry. 2. Inhibition of complement by benzamidines. , 1974, Journal of medicinal chemistry.

[14]  Milan Randic,et al.  Graph Theoretical Approach to Recognition of Structural Similarity in Molecules , 1979, J. Chem. Inf. Comput. Sci..

[15]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..

[16]  Milan Randic,et al.  Use of self-avoiding paths for characterization of molecular graphs with multiple bonds , 1980, Comput. Chem..

[17]  J.-P. Haton A practical application of a real-time isolated-word recognition system using syntactic constraints , 1974 .

[18]  Peter Willett,et al.  A comparison of some hierarchal monothetic divisive clustering algorithms for structure-property correlation , 1983 .

[19]  David Bawden,et al.  A Method of Structure-Activity Correlation Using Wiswesser Line Notation , 1975, J. Chem. Inf. Comput. Sci..

[20]  Howard Lee Morgan,et al.  Spelling correction in systems programs , 1970, Commun. ACM.

[21]  P. J. Harrison,et al.  A Method of Cluster Analysis and Some Applications , 1968 .

[22]  M Randić Graph theoretical approach to structure-activity studies: search for optimal antitumor compounds. , 1985, Progress in clinical and biological research.

[23]  Temple F. Smith,et al.  New Stratigraphic Correlation Techniques , 1980, The Journal of Geology.

[24]  Milan Randić,et al.  A graph theoretical approach to structure-property and structure-activity correlations , 1980 .

[25]  C. Hansch,et al.  The parabolic dependence of drug action upon lipophilic character as revealed by a study of hypnotics. , 1968, Journal of medicinal chemistry.

[26]  Peter Willett,et al.  Evaluation of relocation clustering algorithms for the automatic classification of chemical structures , 1984, J. Chem. Inf. Comput. Sci..

[27]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.