A Fuzzy Kohonen SOM Implementation and Clustering of Bio-active Compound Structures for Drug Discovery

Hierarchical methods like Ward's and group average (Gave) and nonhierarchical methods like Jarvis Patrick's and k-means are preferred methods to cluster a diverse set of compounds for a number of drug targets (using fingerprints based descriptors). In this work the applications of fuzzy Kohonen neural network and other self-organizing map (SOM) algorithms to the clustering of chemical datasets are investigated. The self-organizing maps networks usually possess a number of parameters such as the learning rate and neighborhood size, which are heuristically selected; whilst in the fuzzy self-organizing map, an optimization technique which automatically selects the best parameters using the fuzzy membership functions has been used. The results of the fuzzy SOM neural network are evaluated in comparison with other SOM neural networks (namely Kohonen, neural gas and enhanced neural gas), Wards and group average methods for the clustering of different biologically active chemical structures using topological descriptors. The results show that the performance of fuzzy SOM method not only outperforms other SOM networks with the best heuristics, but also give better results than the Wards and group average methods

[1]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[2]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[3]  P Chacón,et al.  SOMCD: Method for evaluating protein secondary structure from UV circular dichroism spectra , 2001, Proteins.

[4]  H. Wiener Correlation of Heats of Isomerization, and Differences in Heats of Vaporization of Isomers, Among the Paraffin Hydrocarbons , 1947 .

[5]  E A Ferrán,et al.  Self‐organized neural maps of human protein sequences , 1994, Protein science : a publication of the Protein Society.

[6]  P. N. Suganthan,et al.  Robust growing neural gas algorithm with application in cluster analysis , 2004, Neural Networks.

[7]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[8]  Jehan Zeb Shah FCM & G-K clustering of chemical datasets using topological indices , 2005 .

[9]  N. Trinajstic,et al.  On the Harary index for the characterization of chemical graphs , 1993 .

[10]  H. Wiener Relation of the physical properties of the isomeric alkanes to molecular structure; surface, tension, specific dispersion, and critical solution temperature in aniline. , 1948, The Journal of physical and colloid chemistry.

[11]  Hong Yan,et al.  Handwritten signature verification based on neural 'gas' based vector quantization , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[12]  J. Dunbar Cluster-based selection , 1996 .

[13]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[14]  Maccallum Rm Computational analysis of protein sequence and structure. , 1997 .

[15]  Joaquin Dopazo,et al.  Self‐organizing tree‐growing network for the classification of protein sequences , 1998, Protein science : a publication of the Protein Society.

[16]  Nenad Trinajstic,et al.  Use of small computers for large computations: enumeration of polyhex hydrocarbons , 1990, J. Chem. Inf. Comput. Sci..

[17]  Jaroslaw Polanski,et al.  The Comparative Molecular Surface Analysis (COMSA): A Novel Tool for Molecular Design , 2000, Comput. Chem..

[18]  Andrey A. Toropov,et al.  Maximum Topological Distances Based Indices as Molecular Descriptors for QSPR. 4. Modeling the Enthalpy of Formation of Hydrocarbons from Elements , 2001 .

[19]  I. Gutman,et al.  Graph theory and molecular orbitals. XII. Acyclic polyenes , 1975 .

[20]  Zsolt Cselényi,et al.  Mapping the dimensionality, density and topology of data: The growing adaptive neural gas , 2005, Comput. Methods Programs Biomed..

[21]  Panu Somervuo,et al.  Clustering and Visualization of Large Protein Sequence Databases by Means of an Extension on the Self-Organizing Map , 2000, Discovery Science.

[22]  Mamta Thakur,et al.  Application of topological and physicochemical descriptors: QSAR study of phenylamino-acridine derivatives , 2004 .

[23]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[24]  Andreas Zell,et al.  Locating Biologically Active Compounds in Medium-Sized Heterogeneous Datasets by Topological Autocorrelation Vectors: Dopamine and Benzodiazepine Agonists , 1996, J. Chem. Inf. Comput. Sci..

[25]  Peter Willett,et al.  Similarity Searching and Clustering of Chemical-Structure Databases Using Molecular Property Data , 1994, J. Chem. Inf. Comput. Sci..

[26]  Mahdi Mahfouf,et al.  Clustering Files of Chemical Structures Using the Fuzzy k-Means Clustering Method. , 2004 .

[27]  I. Jolliffe Principal Component Analysis , 2002 .

[28]  D. Axelson,et al.  Analysis and classification of proton NMR spectra of lipoprotein fractions from healthy volunteers and patients with cancer or CHD. , 2000, Anticancer research.

[29]  M. Randic Characterization of molecular branching , 1975 .

[30]  James C. Bezdek,et al.  Fuzzy Kohonen clustering networks , 1994, Pattern Recognit..

[31]  J. Gasteiger,et al.  The beauty of molecular surfaces as revealed by self-organizing neural networks. , 1994, Journal of molecular graphics.

[32]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[33]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[34]  Francesco Camastra,et al.  Combining neural gas and learning vector quantization for cursive character recognition , 2003, Neurocomputing.

[35]  Brian D. Gute,et al.  A Hierarchical Approach to the Development of QSAR Models Using Topological, Geometrical and Quantum Chemical Parameters , 2000 .

[36]  M Ala-Korpela,et al.  Application of self‐organizing maps for the detection and classification of human blood plasma lipoprotein lipid profiles on the basis of 1H NMR spectroscopy data , 1998, NMR in biomedicine.

[37]  J Schuchhardt,et al.  Local structural motifs of protein backbones are classified by self-organizing neural networks. , 1996, Protein engineering.

[38]  Harry P. Schultz,et al.  Topological organic chemistry. 1. Graph theory and topological indices of alkanes , 1989, J. Chem. Inf. Comput. Sci..

[39]  D. Cvetkovic,et al.  Graph theory and molecular orbitals , 1974 .

[40]  Janet M. Baker,et al.  Dragon , 1989, HLT.

[41]  Terrance L. Huntsberger,et al.  PARALLEL SELF-ORGANIZING FEATURE MAPS FOR UNSUPERVISED PATTERN RECOGNITION , 1990 .

[42]  A. Kai Qin,et al.  Enhanced neural gas network for prototype-based clustering , 2005, Pattern Recognit..