Predicting the secondary structure of proteins using machine learning algorithms

The functions of proteins in living organisms are related to their 3-D structure, which is known to be ultimately determined by their linear sequence of amino acids that together form these macromolecules. It is, therefore, of great importance to be able to understand and predict how the protein 3D-structure arises from a particular linear sequence of amino acids. In this paper we report the application of Machine Learning methods to predict, with high values of accuracy, the secondary structure of proteins, namely alpha-helices and beta-sheets, which are intermediate levels of the local structure.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  M. Manimekalai PREDICTION OF SECONDARY STRUCTURE OF PROTEINS , 2010 .

[3]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[4]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[5]  W C Johnson,et al.  The relative order of helical propensity of amino acids changes with solvent environment , 2000, Proteins.

[6]  Eibe Frank,et al.  Logistic Model Trees , 2003, Machine Learning.

[7]  João Gama,et al.  Functional Trees , 2001, Machine Learning.

[8]  J. Richardson,et al.  Amino acid preferences for specific locations at the ends of alpha helices. , 1988, Science.

[9]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[10]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[11]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[12]  J. Richardson,et al.  Corrections: Amino Acid Preferences for Specific Locations at the Ends of α Helices , 1988 .

[13]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[14]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[15]  B. Matthews,et al.  Structural basis of amino acid alpha helix propensity. , 1993, Science.

[16]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[17]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[20]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[21]  C. Anfinsen,et al.  Reductive cleavage of disulfide bridges in ribonuclease. , 1957, Science.

[22]  P. Argos,et al.  Seventy‐five percent accuracy in protein secondary structure prediction , 1997, Proteins.

[23]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[24]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[27]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[28]  S. Muggleton,et al.  Modelling the structure and function of enzymes by machine learning. , 1992, Faraday discussions.

[29]  Ian Witten,et al.  Data Mining , 2000 .

[30]  A. Caflisch,et al.  The role of side-chain interactions in the early steps of aggregation: Molecular dynamics simulations of an amyloid-forming peptide from the yeast prion Sup35 , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..