Reaching Optimized Parameter Set, Protein Secondary Structure Prediction Using Neural Network

We propose an optimized parameter set for protein secondary structure prediction using three layer feed forward back propagation neural network. The methodology uses four parameters viz. encoding scheme, window size, number of neurons in the hidden layer and type of learning algorithm. The input layer of the network consists of neurons changing from 3 to 19, corresponding to different window sizes. The hidden layer chooses a natural number from 1 to 20 as the number of neurons. The output layer consists of three neurons, each corresponding to known secondary structural classes viz. α – helix, β-strands and coil/turns respectively. It also uses eight different learning algorithms and nine encoding schemes. Exhaustive experiments were performed using non-homologues dataset. The experimental results were compared using performance measures like Q3, sensitivity, specificity, Mathew correlation coefficient and accuracy. The paper also discusses the process of obtaining a stabilized cluster of 2530 records from a collection of 11340 records. The graphs of these stabilized clusters of records with respect to accuracy are concave, convergence is monotonic increasing and rate of convergence is uniform. The paper gives BLOSUM62 as the encoding scheme, 19 as the window size, 19 as the number of neurons in the hidden layer and OneStep Secant as the learning algorithm with the highest accuracy of 78%. These parameter values are proposed as the optimized parameter set for the three layer feed forward back propagation neural network for the protein secondary structure prediction.

[1]  S Brunak,et al.  Protein structures from distance inequalities. , 1993, Journal of molecular biology.

[2]  Joachim A Hering,et al.  Neuro‐fuzzy structural classification of proteins for improved protein secondary structure prediction , 2003, Proteomics.

[3]  N A Obuchowski,et al.  Sample size tables for receiver operating characteristic studies. , 2000, AJR. American journal of roentgenology.

[4]  J. Mesirov,et al.  Hybrid system for protein secondary structure prediction. , 1992, Journal of molecular biology.

[5]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[6]  Cathy H. Wu Artificial Neural Networks for Molecular Sequence Analysis , 1997, Comput. Chem..

[7]  P Stolorz,et al.  Predicting protein secondary structure using neural net and statistical methods. , 1992, Journal of molecular biology.

[8]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.

[9]  T Kawabata,et al.  Improvement of protein secondary structure prediction using binary word encoding , 1997, Proteins.

[10]  Friedhelm Pfeiffer,et al.  Database of protein sequence alignments: PIR-ALN , 1999, Nucleic Acids Res..

[11]  T W Barlow,et al.  Feed-forward neural networks for secondary structure prediction. , 1995, Journal of molecular graphics.

[12]  P. Samaraweera,et al.  A Simple Comparison between Specific Protein Secondary Structure Prediction Tools , 2012 .

[13]  Piero Fariselli,et al.  Divide and Conquer Strategies for Protein Structure Prediction , 2011, Mathematical Approaches to Polymer Sequence Analysis and Related Problems.

[14]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[15]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[16]  Hae-Jin Hu,et al.  Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier , 2004, IEEE Transactions on NanoBioscience.

[17]  M J Sternberg,et al.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. , 1992, Biochemistry.

[18]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[19]  Anureet Kaur Johal,et al.  Protein Secondary Structure Prediction Using Improved Support Vector Machine And Neural Networks , 2014 .

[20]  R Langridge,et al.  Improvements in protein secondary structure prediction by an enhanced neural network. , 1990, Journal of molecular biology.

[21]  Lior Rokach,et al.  Introduction to Knowledge Discovery and Data Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[22]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[24]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[25]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[26]  Koji Tajima,et al.  Prediction of Protein Secondary Structure by the Neural Network , 1991 .

[27]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[28]  Emir Buza,et al.  Neural Network Algorithm for Prediction of Secondary Protein Structure , 2009 .

[29]  Alessio Ceroni,et al.  Learning protein secondary structure from sequential and relational data , 2005, Neural Networks.

[30]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[31]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[32]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[33]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[34]  Jude W. Shavlik,et al.  Using knowledge-based neural networks to improve algorithms: Refining the Chou-Fasman algorithm for protein folding , 2004, Machine Learning.

[35]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[36]  G. Fasman The Development of the Prediction of Protein Structure , 1989 .

[37]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[38]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[39]  Jonathan D. Hirst,et al.  Prediction of backbone dihedral angles and protein secondary structure using support vector machines , 2009, BMC Bioinformatics.

[40]  Jaewon Yang Protein Secondary Structure Prediction based on Neural Network Models and Support Vector Machines , 2008 .

[41]  A. Lehninger Principles of Biochemistry , 1984 .

[42]  K-L Ting,et al.  Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence , 2002, Proteins.

[43]  Abdollah Dehzangi,et al.  Predicting the Secondary Structure of Proteins by Cascading Neural Networks , 2012 .

[44]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[45]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[46]  Pierre Baldi,et al.  Gradient descent learning algorithms: a unified perspective , 1995 .

[47]  Terri K. Attwood,et al.  PRINTS-S: the database formerly known as PRINTS , 2000, Nucleic Acids Res..

[48]  D Gorse,et al.  Prediction of the location and type of β‐turns in proteins using neural networks , 1999, Protein science : a publication of the Protein Society.

[49]  Haiyan Zhang,et al.  Algebraic Encoding and Protein Secondary Structure Prediction , 2012 .

[50]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Shmuel Pietrokovski,et al.  The Blocks database--a system for protein classification , 1996, Nucleic Acids Res..

[52]  K. Nagano,et al.  Triplet information in helix prediction applied to the analysis of super-secondary structures. , 1977, Journal of molecular biology.

[53]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[54]  W. Hager,et al.  A SURVEY OF NONLINEAR CONJUGATE GRADIENT METHODS , 2005 .

[55]  J Skolnick,et al.  Prediction of protein secondary structure by neural networks: encoding short and long range patterns of amino acid packing. , 1992, Acta biochimica Polonica.

[56]  H A Scheraga,et al.  Improvements in the prediction of protein backbone topography by reduction of statistical errors. , 1979, Biochemistry.

[57]  M. J. D. Powell,et al.  Restart procedures for the conjugate gradient method , 1977, Math. Program..

[58]  Yaohang Li,et al.  Context-Based Features Enhance Protein Secondary Structure Prediction Accuracy , 2014, J. Chem. Inf. Model..

[59]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[60]  K. Pearson,et al.  ON THE LAWS OF INHERITANCE IN MAN I. INHERITANCE OF PHYSICAL CHARACTERS , 1903 .

[61]  Kandarpa Kumar Sarma,et al.  Protein Structure Prediction Using Multiple Artificial Neural Network Classifier , 2012, Soft Computing Techniques in Vision Science.

[62]  H. Scheraga,et al.  Chain reversals in proteins. , 1973, Biochimica et biophysica acta.

[63]  Andy Farnell An introduction to procedural audio and its application in computer games , 2007 .

[64]  Patel Mayuri Dinubhai,et al.  Protein Secondary Structure Prediction Using Neural Network: A Comparative Study , 2014 .

[65]  S. Henikoff,et al.  Automated assembly of protein blocks for database searching. , 1991, Nucleic acids research.

[66]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[67]  Kotagiri Ramamohanarao,et al.  A survey of machine learning methods for secondary and supersecondary protein structure prediction. , 2013, Methods in molecular biology.

[68]  J M Chandonia,et al.  Neural networks for secondary structure and structural class predictions , 1995, Protein science : a publication of the Protein Society.

[69]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[70]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[71]  J. Hirst,et al.  Protein secondary structure prediction with dihedral angles , 2005, Proteins.

[72]  W. Kabsch,et al.  How good are predictions of protein secondary structure? , 1983, FEBS letters.

[73]  L. Pauling,et al.  The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[74]  Arundhati Deka,et al.  Artificial Neural Network aided Protein Structure Prediction , 2012 .

[75]  Ling Zhou,et al.  An improved prediction of protein secondary structures based on a multi-mold integrated neural network , 2012, 2012 8th International Conference on Natural Computation.

[76]  F. Collins,et al.  Principles of Biochemistry , 1937, The Indian Medical Gazette.

[77]  Lukasz A. Kurgan,et al.  Critical assessment of high-throughput standalone methods for secondary structure prediction , 2011, Briefings Bioinform..

[78]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[79]  Cathy H. Wu,et al.  ProClass protein family database , 2000, Nucleic Acids Res..

[80]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[81]  Douglas L. Brutlag,et al.  The EMOTIF database , 2001, Nucleic Acids Res..

[82]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[83]  Wen-Lian Hsu,et al.  HYPROSP: a hybrid protein secondary structure prediction algorithm--a knowledge-based approach. , 2004, Nucleic acids research.

[84]  A. Bachelor GLOSSARY OF TERMS GLOSSARY OF TERMS , 2010 .

[85]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[86]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[87]  Cathy H. Wu,et al.  Protein sequence databases. , 2004, Current opinion in chemical biology.

[88]  Peter J. Simpson,et al.  NMR of proteins and nucleic acids , 2015 .

[89]  Terri K. Attwood,et al.  PRINTS prepares for the new millennium , 1999, Nucleic Acids Res..

[90]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[91]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[92]  G. Cooper The Cell: A Molecular Approach , 1996 .

[93]  C. Dobson,et al.  High-resolution molecular structure of a peptide in an amyloid fibril determined by magic angle spinning NMR spectroscopy. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[94]  Dongsup Kim,et al.  Prediction of protein secondary structure content using amino acid composition and evolutionary information , 2005, Proteins.

[95]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[96]  S. Henikoff,et al.  Blocks database and its applications. , 1996, Methods in enzymology.

[97]  De-Shuang Huang,et al.  Prediction of protein secondary structure using improved two-level neural network architecture. , 2005, Protein and peptide letters.

[98]  Baris E. Suzek,et al.  The Universal Protein Resource (UniProt) in 2010 , 2009, Nucleic Acids Res..

[99]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[100]  Jérôme Gouzy,et al.  Recent improvements of the ProDom database of protein domain families , 1999, Nucleic Acids Res..

[101]  Denise Gorse,et al.  A novel approach to the recognition of protein architecture from sequence using fourier analysis and neural networks , 2002, Proteins.

[102]  A A Salamov,et al.  Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. , 1995, Journal of molecular biology.

[103]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[104]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[105]  A. Griffiths Introduction to Genetic Analysis , 1976 .

[106]  M. A. Mottalib,et al.  Protein Secondary Structure Prediction using Feed-Forward Neural Network , 2010 .

[107]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[108]  Richard Wolfenden,et al.  Comparing the polarities of the amino acids: side-chain distribution coefficients between the vapor phase, cyclohexane, 1-octanol, and neutral aqueous solution , 1988 .

[109]  Alessandro Orro,et al.  A Hybrid Genetic-Neural System for Predicting Protein Secondary Structure , 2005, BMC Bioinformatics.

[110]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[111]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[112]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[113]  S Brunak,et al.  Protein secondary structure and homology by neural networks. The alpha-helices in rhodopsin. , 1988, FEBS letters.

[114]  Françoise Fogelman-Soulié,et al.  Incorporating knowledge in multi-layer networks: the example of protein secondary structure prediction , 1989, NATO Neurocomputing.

[115]  G. Böhm,et al.  New approaches in molecular structure prediction. , 1996, Biophysical chemistry.

[116]  Shmuel Pietrokovski,et al.  Increased coverage of protein families with the Blocks Database servers , 2000, Nucleic Acids Res..

[117]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[118]  Peter B. McGarvey,et al.  The Protein Information Resource (PIR) , 2000, Nucleic Acids Res..

[119]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[120]  David Gur,et al.  Prevalence effect in a laboratory environment. , 2003, Radiology.

[121]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[122]  Marc A. Martí-Renom,et al.  EVA: evaluation of protein structure prediction servers , 2003, Nucleic Acids Res..

[123]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[124]  J. Drenth Principles of protein x-ray crystallography , 1994 .

[125]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[126]  C. Blake,et al.  The structure of amyloid fibrils by electron microscopy and X-ray diffraction. , 1997, Advances in protein chemistry.