Using Cluster Analysis for Protein Secondary Structure Prediction

As biomedical research and healthcare continue to progress in the genomic/post genomic era, a number of important challenges and opportunities exist in the broad area of bioinformatics. In the broader context, the key challenges to bioinformatics essentially all relate to the current flood of raw data, aggregate information, and evolving knowledge arising from the study of the genome and its manifestation. Protein structure determination and prediction has been a focal research subject in life sciences due to the importance of protein structure in understanding the biological and chemical activities of organisms. The experimental methods used to determine the structures of proteins demand sophisticated equipment and time. A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results. The present work focuses on secondary structure prediction of proteins. The data mining model is implemented to predict the various parameters related to the secondary structure. These parameters include the alpha helix, beta sheets and hairpin turn. Cluster analysis is used to implement the secondary structure prediction.

[1]  Ioannis P. Vlahavas,et al.  Biological Data Mining , 2007 .

[2]  Thorsten Meinl,et al.  Graph based molecular data mining - an overview , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[3]  Kevin D. Reilly,et al.  SEQOPTICS: a protein sequence clustering system , 2006, BMC Bioinformatics.

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Anil Kumar,et al.  Predicted secondary structure of maltodextrin Phosphorylase fromEscherichia coli as deduced using Chou-Fasman model , 1990, Journal of Biosciences.

[7]  Jake Y. Chen,et al.  Biological Data Mining , 2009 .

[8]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[9]  Haitao Cheng,et al.  Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: Combining GOR V and Fragment Database Mining (FDM) , 2007, Bioinform..