An In Silico Approach to Cluster CAM Kinase Protein Sequences

As we are ushering in new age of data driven world, we face an enormous challenge of deriving information from heaps of data available. The amount of data being generated is overwhelming and this calls for exploring novel and effective methods for clustering and classification of such data. CAM kinase family is known to contain many enzymes involved in impor tant physiological processes. In the present study, 13 important physicochemical parameters were calculated for 56 sequences of CAM kinase family in silico . Self organizing Maps (SOM) were employed for the classifying and clustering similar sequences and visualization of high dimensional data spaces as they are known for their capability to maintain the essence of topological relationships between the features. SOM effectively yielded 4 clusters which were distinct from each other and marked by characteristi c features.

[1]  A Ikai,et al.  Thermostability and aliphatic index of globular proteins. , 1980, Journal of biochemistry.

[2]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[3]  J. Felsenstein Numerical Methods for Inferring Evolutionary Trees , 1982, The Quarterly Review of Biology.

[4]  Joseph Felsenstein,et al.  Parsimony in Systematics: Biological and Statistical Issues , 1983 .

[5]  P. Greengard,et al.  Protein kinases in the brain. , 1985, Annual review of biochemistry.

[6]  A. Edelman,et al.  Protein serine/threonine kinases. , 1987, Annual review of biochemistry.

[7]  Patrizio Arrigo,et al.  Identification of a new motif on nucleic acid sequence data using Kohonen's self-organizing map , 1991, Comput. Appl. Biosci..

[8]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[9]  Christophe Geourjon,et al.  SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments , 1995, Comput. Appl. Biosci..

[10]  H. Schulman,et al.  The multifunctional calcium/calmodulin-dependent protein kinase: from form to function. , 1995, Annual review of physiology.

[11]  Young-Seuk Park,et al.  Patternizing communities by using an artificial neural network , 1996 .

[12]  T. Soderling Structure and regulation of calcium/calmodulin-dependent protein kinases II and IV. , 1996, Biochimica et biophysica acta.

[13]  Roderic D. M. Page,et al.  TreeView: an application to display phylogenetic trees on personal computers , 1996, Comput. Appl. Biosci..

[14]  Peer Bork,et al.  Self‐organizing hierarchic networks for pattern recognition in protein sequence , 1996, Protein science : a publication of the Protein Society.

[15]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[16]  Miguel A. Andrade-Navarro,et al.  Classification of protein families and detection of the determinant residues with an improved self-organizing map , 1997, Biological Cybernetics.

[17]  M Ala-Korpela,et al.  Application of self‐organizing maps for the detection and classification of human blood plasma lipoprotein lipid profiles on the basis of 1H NMR spectroscopy data , 1998, NMR in biomedicine.

[18]  K. Eto,et al.  Ca2+/Calmodulin-dependent Protein Kinase Cascade in Caenorhabditis elegans , 1999, The Journal of Biological Chemistry.

[19]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999 , 1999, Nucleic Acids Res..

[20]  W. J. Walley,et al.  Self-Organising Maps for the Classification and Diagnosis of River Quality from Biological and Environmental Data , 1999, ISESS.

[21]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[22]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  R. Denzer Environmental Software Systems: Environmental Information and Decision Support, IFIP TC5 WG5.11 3rd International Symposium on Environmental Software Systems (ISESS'99), August 30 - September 2, 1999, Dunedin, New Zealand , 2000, ISESS.

[24]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[25]  D. Chen,et al.  Breast cancer diagnosis using self-organizing map for sonography. , 2000, Ultrasound in medicine & biology.

[26]  Ling Guan,et al.  Feature extraction of chromosomes from 3-D confocal microscope images , 2001, IEEE Trans. Biomed. Eng..

[27]  S. Kanaya,et al.  Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. , 2001, Gene.

[28]  J. Badger,et al.  Analysis of codon usage patterns of bacterial genomes using the self-organizing map. , 2001, Molecular biology and evolution.

[29]  P Chacón,et al.  SOMCD: Method for evaluating protein secondary structure from UV circular dichroism spectra , 2001, Proteins.

[30]  S. Kanaya,et al.  A novel bioinformatic strategy for unveiling hidden genome signatures of eukaryotes: self-organizing map of oligonucleotide frequency. , 2002, Genome informatics. International Conference on Genome Informatics.

[31]  D. Covell,et al.  Molecular classification of cancer: unsupervised self-organizing map analysis of gene expression microarray data. , 2003, Molecular cancer therapeutics.

[32]  H. Ressom,et al.  Clustering gene expression data using adaptive double self-organizing map. , 2003, Physiological genomics.

[33]  James O. McInerney,et al.  Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models , 2004, BMC Bioinformatics.

[34]  Saman K. Halgamuge,et al.  An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data , 2003, Bioinform..

[35]  Mark J. Embrechts,et al.  DNA classifications with self-organizing maps (SOMs) , 2003, Proceedings of the 2003 IEEE International Workshop on Soft Computing in Industrial Applications, 2003. SMCia/03..

[36]  S. Lek,et al.  Spatial and temporal patterns of benthic invertebrate communities in an intermittent river (North Africa) , 2004 .

[37]  Jouko Lampinen,et al.  Clustering properties of hierarchical self-organizing maps , 1992, Journal of Mathematical Imaging and Vision.

[38]  Aaron Golden,et al.  Transcription factor binding site identification using the self-organizing map , 2005, Bioinform..

[39]  M. Costea,et al.  Typology of diatom communities and the influence of hydro-ecoregions: A study on the French hydrosystem scale , 2005 .

[40]  Samuel Kaski,et al.  Self-organizing map-based discovery and visualization of human endogenous retroviral sequence groups , 2005, Int. J. Neural Syst..

[41]  K. Hoshi,et al.  Assisting the diagnosis of thyroid diseases with Bayesian-type and SOM-type neural networks making use of routine test data. , 2006, Chemical & pharmaceutical bulletin.

[42]  U. Murty,et al.  Application Of Self-Organizing Maps For Prioritization Of Malaria Control Operations In Changlang District, Arunachal Pradesh , 2006 .

[43]  Aaron Golden,et al.  Self-organizing neural networks to support the discovery of DNA-binding motifs , 2006, Neural Networks.

[44]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[45]  Cheng-Yuan Liou,et al.  Application of Self-Organizing Map(SOM) for Cerebral Cortex Reconstruction , 2007 .

[46]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[47]  Stability of ITS2 Secondary Structure in Anopheles: , 2008 .

[48]  Ch. Venkateswarlu,et al.  Classification and identification of mosquito species using artificial neural networks , 2008, Comput. Biol. Chem..

[49]  Neelima Arora,et al.  Exploring the Interplay of Sequence and Structural Features in Determining the Flexibility of AGC Kinase Protein Family : A Bioinformatics Approach , 2008 .

[50]  Neelima Arora,et al.  Clustering and Classification of Anopheline Spacer Sequences using Self Organizing Maps , 2008 .