Feature identification and reduction for improved generalization accuracy in secondary-structure prediction

Secondary structure prediction is an important step in understanding gene function. Several algorithms have been proposed for applying machine learning techniques to this problem. This research examines these algorithms and constructs a framework that is effective in providing accurate predictions.

[1]  M. Schiffer,et al.  Use of helical wheels to represent the structures of proteins and to identify segments with helical potential. , 1967, Biophysical journal.

[2]  Mathura S Venkatarajan,et al.  New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties , 2001 .

[3]  H A Scheraga,et al.  Improvements in the prediction of protein backbone topography by reduction of statistical errors. , 1979, Biochemistry.

[4]  Claus Lundegaard,et al.  NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features , 2010, PloS one.

[5]  C. Anfinsen,et al.  The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. , 1961, Proceedings of the National Academy of Sciences of the United States of America.

[6]  A. Szent-Gyorgyi,et al.  Role of proline in polypeptide chain configuration of proteins. , 1957, Science.

[7]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[8]  G. Barton,et al.  Conservation analysis and structure prediction of the SH2 family of phosphotyrosine binding domains , 1992, FEBS letters.

[9]  K. Guruprasad,et al.  Beta-and gamma-turns in proteins revisited: a new set of amino acid turn-type dependent positional preferences and potentials. , 2000, Journal of biosciences.

[10]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[11]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[12]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[13]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[14]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[15]  Yücel Altunbasak,et al.  Protein secondary structure prediction for a single-sequence using hidden semi-Markov models , 2006, BMC Bioinformatics.

[16]  O. Ptitsyn,et al.  Statistical analysis of the distribution of amino acid residues among helical and non-helical regions in globular proteins. , 1969, Journal of molecular biology.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  V. Uversky Intrinsically Disordered Proteins , 2014 .

[19]  J. S. Roach What's in a genome? , 2000, Analytical chemistry.

[20]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[21]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[22]  A. Finkelstein,et al.  Theory of protein secondary structure and algorithm of its prediction , 1983, Biopolymers.

[23]  Morten Nielsen,et al.  Prediction of Protein Secondary Structure at High Accuracy Using a Combination of Many Neural Networks , 2003, Mathematical Methods for Protein Structure Analysis and Design.

[24]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[25]  A. Alix,et al.  High accuracy prediction of β‐turns and their types using propensities and multiple alignments , 2005 .

[26]  Gert Vriend,et al.  Everyday , 2020, Oxford Research Encyclopedia of Literature.

[27]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Steven E Brenner,et al.  The Impact of Structural Genomics: Expectations and Outcomes , 2005, Science.

[29]  C. Levinthal How to fold graciously , 1969 .

[30]  F. Straub FORMATION OF THE SECONDARY AND TERTIARY STRUCTURE OF ENZYMES. , 1964, Advances in enzymology and related subjects of biochemistry.

[31]  W. Kabsch,et al.  How good are predictions of protein secondary structure? , 1983, FEBS letters.

[32]  J. Edsall Configuration of certain protein molecules. An inquiry concerning the present status of our knowledge , 1954 .

[33]  Xin-Qiu Yao,et al.  A dynamic Bayesian network approach to protein secondary structure prediction , 2008, BMC Bioinformatics.

[34]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[35]  K. Nagano,et al.  Logical analysis of the mechanism of protein folding. IV. Super-secondary structures. , 1977, Journal of molecular biology.

[36]  David Eisenberg,et al.  The discovery of the α-helix and β-sheet, the principal structural features of proteins , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  A. M. Liquori,et al.  Recognition of α-helical segments in proteins of known primary structure☆ , 1967 .

[38]  P. S. Kim,et al.  Context-dependent secondary structure formation of a designed protein sequence , 1996, Nature.

[39]  K. Chou,et al.  Wenxiang: a web-server for drawing wenxiang diagrams , 2011 .

[40]  E. Lander,et al.  Protein secondary structure prediction using nearest-neighbor methods. , 1993, Journal of molecular biology.

[41]  Pierre Baldi,et al.  Large‐scale prediction of disulphide bridges using kernel methods, two‐dimensional recursive neural networks, and weighted graph matching , 2005, Proteins.

[42]  Heinz-Theodor Mevissen,et al.  Decision tree-based formation of consensus protein secondary structure prediction , 1999, Bioinform..

[43]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[44]  M. Can Conformational Parameters for Amino Acids in Helical, β-Sheet, and Random Coil Regions Calculated from Proteins: After 40 Years , 2015, SOCO 2015.

[45]  A. Guzzo,et al.  The influence of amino-acid sequence on protein structure. , 1965, Biophysical journal.

[46]  H. Berman The Protein Data Bank: a historical perspective. , 2008, Acta crystallographica. Section A, Foundations of crystallography.

[47]  Satoru Hayamizu,et al.  Prediction of protein secondary structure by the hidden Markov model , 1993, Comput. Appl. Biosci..

[48]  Christopher J. Oldfield,et al.  Intrinsically disordered protein. , 2001, Journal of molecular graphics & modelling.

[49]  Igor N Berezovsky,et al.  Loop Fold Structure of Proteins: Resolution of Levinthal's Paradox , 2002, Journal of biomolecular structure & dynamics.

[50]  G. Barton,et al.  Amino acid sequence analysis of the annexin super-gene family of proteins. , 1991, European journal of biochemistry.

[51]  J W Prothero,et al.  Correlation between the distribution of amino acids and alpha helices. , 1966, Biophysical journal.

[52]  Jagath C. Rajapakse,et al.  Two-Stage Multi-Class Support Vector Machines to Protein Secondary Structure Prediction , 2004, Pacific Symposium on Biocomputing.

[53]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[54]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[55]  C. Pace,et al.  A helix propensity scale based on experimental studies of peptides and proteins. , 1998, Biophysical journal.

[56]  Stefano Pascarella,et al.  PRONET: a microcomputer program for predicting the secondary structure of proteins with a neural network , 1989, Comput. Appl. Biosci..

[57]  I. Crawford,et al.  Prediction of secondary structure by evolutionary comparison: Application to the α subunit of tryptophan synthase , 1987, Proteins.

[58]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[59]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[60]  B. Robson,et al.  Analysis of the code relating sequence to conformation in proteins: possible implications for the mechanism of formation of helical regions. , 1971, Journal of molecular biology.

[61]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[62]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[63]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[64]  K. Chou,et al.  Using Pair-Coupled Amino Acid Composition to Predict Protein Secondary Structure Content , 1999, Journal of protein chemistry.

[65]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[66]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[67]  Zhi-Wei Cao,et al.  Efficacy of different protein descriptors in predicting protein functional families , 2007, BMC Bioinformatics.

[68]  Charlotte M. Deane,et al.  Synonymous codon usage influences the local protein structure observed , 2010, Nucleic acids research.

[69]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[70]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[71]  I. Simon,et al.  Investigation of protein refolding: a special feature of native structure responsible for refolding ability. , 1985, Journal of theoretical biology.

[72]  J. Kendrew,et al.  A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis , 1958, Nature.

[73]  V. Lim Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. , 1974, Journal of molecular biology.

[74]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[75]  C H BAMFORD,et al.  Molecular Configuration and Physical Properties of Polypeptides and Proteins , 1951, Proceedings of the Royal Society of Medicine.

[76]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[77]  Marimuthu Palaniswami,et al.  Protein Secondary Structure Prediction Using Support Vector Machines and a New Feature Representation , 2006, Int. J. Comput. Intell. Appl..

[78]  G M Maggiora,et al.  Disposition of amphiphilic helices in heteropolar environments , 1997, Proteins.

[79]  Volker A. Eyrich,et al.  EVA: Large‐scale analysis of secondary structure prediction , 2001, Proteins.

[80]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[81]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.