A survey of machine learning methods for secondary and supersecondary protein structure prediction.

In this chapter we provide a survey of protein secondary and supersecondary structure prediction using methods from machine learning. Our focus is on machine learning methods applicable to β-hairpin and β-sheet prediction, but we also discuss methods for more general supersecondary structure prediction. We provide background on the secondary and supersecondary structures that we discuss, the features used to describe them, and the basic theory behind the machine learning methods used. We survey the machine learning methods available for secondary and supersecondary structure prediction and compare them where possible.

[1]  K. Dill,et al.  The protein folding problem. , 1993, Annual review of biophysics.

[2]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[3]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[4]  Anna Tramontano,et al.  Critical assessment of methods of protein structure prediction—Round VII , 2007, Proteins.

[5]  Dongsheng Zou,et al.  β‐Hairpin prediction with quadratic discriminant analysis using diversity measure , 2009, J. Comput. Chem..

[6]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[7]  Kristin P. Bennett,et al.  Support vector machines: hype or hallelujah? , 2000, SKDD.

[8]  Pierre Baldi,et al.  The Principled Design of Large-Scale Recursive Neural Network Architectures--DAG-RNNs and the Protein Structure Prediction Problem , 2003, J. Mach. Learn. Res..

[9]  John L. Klepeis,et al.  Prediction of β‐sheet topology and disulfide bridges in polypeptides , 2003, J. Comput. Chem..

[10]  Janet M. Thornton,et al.  Toward predicting protein topology: An approach to identifying β hairpins , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Pierre Baldi,et al.  Matching Protein b-Sheet Partners by Feedforward and Recurrent Neural Networks , 2000, ISMB.

[12]  Hiroyoshi Matsumura,et al.  Conformational contagion in a protein: Structural properties of a chameleon sequence , 2007, Proteins.

[13]  Yi Zhong,et al.  Searching for three-dimensional secondary structural patterns in proteins with ProSMoS , 2007, Bioinform..

[14]  Piotr Berman,et al.  Bringing Folding Pathways into Strand Pairing Prediction , 2007, WABI.

[15]  A. Tramontano,et al.  Critical assessment of methods of protein structure prediction (CASP)—round IX , 2011, Proteins.

[16]  E G Hutchinson,et al.  The Greek key motif: extraction, classification and analysis. , 1993, Protein engineering.

[17]  Baldomero Oliva,et al.  ArchDB: automated protein loop classification as a tool for structural genomics , 2004, Nucleic Acids Res..

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  Jean-Marc Steyaert,et al.  Prediction of super-secondary structure in α-helical and β-barrel transmembrane proteins , 2009, BMC Bioinformatics.

[20]  A M Lesk,et al.  Systematic representation of protein folding patterns. , 1995, Journal of molecular graphics.

[21]  Marianne Rooman,et al.  Structural classification of αββ and ββα supersecondary structure units in proteins , 1998 .

[22]  W R Taylor,et al.  Recognition of super-secondary structure in proteins. , 1984, Journal of molecular biology.

[23]  T.J.P. Hubbard,et al.  Use of /spl beta/-strand interaction pseudo-potentials in protein structure prediction and modelling , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[24]  Hakan Erdogan,et al.  Bayesian Models and Algorithms for Protein β-Sheet Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[26]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[27]  Tom L. Blundell,et al.  The pattern of common supersecondary structure (motifs) in protein database , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[28]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[29]  Y. Cui,et al.  Protein folding simulation with genetic algorithm and supersecondary structure constraints , 1998, Proteins.

[30]  Ulrich Baxa,et al.  β arcades: recurring motifs in naturally occurring and disease‐related amyloid fibrils , 2010, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[31]  Burkhard Rost,et al.  PHD - an automatic mail server for protein secondary structure prediction , 1994, Comput. Appl. Biosci..

[32]  Pierre Baldi,et al.  Three-stage prediction of protein ?-sheets by neural networks, alignments and graph algorithms , 2005, ISMB.

[33]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[34]  Dongsheng Zou,et al.  Supersecondary structure prediction using Chou's pseudo amino acid composition , 2011, J. Comput. Chem..

[35]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[36]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[37]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[38]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.

[39]  Qian Li,et al.  Prediction of the β-Hairpins in Proteins Using Support Vector Machine , 2008 .

[40]  Q. Z. Li,et al.  The prediction of the structural class of protein: application of the measure of diversity. , 2001, Journal of theoretical biology.

[41]  Santosh B. Noronha,et al.  Protein structure prediction aided by geometrical and probabilistic constraints , 2007, J. Comput. Chem..

[42]  C. Floudas,et al.  ASTRO-FOLD: a combinatorial and global optimization framework for Ab initio prediction of three-dimensional structures of proteins from the amino acid sequence. , 2003, Biophysical journal.

[43]  Richard Bonneau,et al.  Distributions of beta sheets in proteins with application to structure prediction , 2002, Proteins.

[44]  Xing-Ming Zhao,et al.  Prediction of beta-hairpins in proteins using physicochemical properties and structure information. , 2010, Protein and peptide letters.

[45]  D Xu,et al.  Prediction of protein supersecondary structures based on the artificial neural network method. , 1997, Protein engineering.

[46]  J. Skolnick,et al.  TOUCHSTONEX: Protein structure prediction with sparse NMR data , 2003, Proteins.

[47]  James Bailey,et al.  g-MARS: Protein Classification Using Gapped Markov Chains and Support Vector Machines , 2008, PRIB.

[48]  L. Serrano,et al.  A short linear peptide that folds into a native stable β-hairpin in aqueous solution , 1994, Nature Structural Biology.

[49]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[50]  François Major,et al.  Ranking the factors that contribute to protein β‐sheet folding , 2007 .

[51]  Qian-Zhong Li,et al.  Recognition of β-hairpin motifs in proteins by using the composite vector , 2009, Amino Acids.

[52]  Gajendra P. S. Raghava,et al.  BhairPred: prediction of β-hairpins in a protein from multiple alignment information using ANN and SVM techniques , 2005, Nucleic Acids Res..

[53]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[54]  C. Floudas,et al.  Contact prediction for beta and alpha‐beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO‐FOLD , 2010, Proteins.

[55]  Jean-Loup Faulon,et al.  Prediction of β-strand packing interactions using the signature product , 2006, Journal of molecular modeling.

[56]  Wenjian Liu,et al.  Comprehensive ab initio calculation and simulation on the low‐lying electronic states of TlX (X = F, Cl, Br, I, and At) , 2009, J. Comput. Chem..

[57]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[58]  J. M. Thornton,et al.  Prediction of super-secondary structure in proteins , 1983, Nature.

[59]  Eaton E Lattman,et al.  Seventh Meeting on the Critical Assessment of Techniques for Protein Structure Prediction , 2007, Proteins.

[60]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[61]  Brent Wathen,et al.  Folding by Numbers: Primary Sequence Statistics and Their Use in Studying Protein Folding , 2009, International journal of molecular sciences.

[62]  Jens Meiler,et al.  Strand‐loop‐strand motifs: Prediction of hairpins and diverging turns in proteins , 2004, Proteins.

[63]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[64]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[65]  J. Thornton,et al.  Prediction of strand pairing in antiparallel and parallel β‐sheets using information theory , 2002, Proteins.

[66]  Karen E Marshall,et al.  Structural integrity of beta-sheet assembly. , 2009, Biochemical Society transactions.

[67]  W. Braun,et al.  Sequence specificity, statistical potentials, and three‐dimensional structure prediction with self‐correcting distance geometry calculations of β‐sheet formation in proteins , 2008 .

[68]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.