Prediction of protein structural features by use of artificial neural networks

In the past decades we have seen an exponential growth of biological sequence data. The cost for DNA sequencing has dropped significantly since the announcement of the first sequenced genome and newly sequenced genomes are published almost every week. Publicly available genetic sequence databases like for example GenBank are increasing considerably in size and GenBank currently contains more than 132 million sequences. Similar the Protein Data Bank currently contains more than 71,000 experimentally determined structures of nucleic acids, proteins and nucleic acid/protein complexes. There is a huge over-representation of DNA sequences when comparing the amount of experimentally verified proteins with the amount of DNA sequences. The academic and industrial research community therefore has to rely on structure predictions instead of waiting for the time consuming experimentally determined structure data. This thesis describes the development of two new tools to study such genetic sequence data. NetSurfP was developed to predict the surface accessibility of amino acids in amino acid sequences. Knowledge of the degree of surface exposure of an amino acid is valuable and has been used to enhance the understanding of a variety of biological problems, including protein-protein interaction, prediction of epitopes and active sites. Following NetSurfP, NetTurnp was developed for the prediction of β-turn occurrence. Using secondary structure and surface accessibility predictions from NetSurfP, a better understanding and improvement of the performance for the prediction of β-turns was obtained. β-turns are very interesting in the way that they are the most abundant type of turn structures, and approximately 25% of all amino acids in protein structures are located in a β-turn. In bioinformatics speed and accuracy is an important factor, hence the developed tools are expected to return a result in a rapid and efficient manner. Our way of solving that problem was to pre calculate protein sequence data. Currently, more than 500,000 protein sequences are in the local cache. In relation to surface exposure, a third project dealt with the prediction of discontinuous B-cell epitopes. Here Half Sphere Exposure (HSE) was integrated in an existing prediction method. HSE is a measure of solvent exposure where the upper and lower epitope contacts to a given residue can be weighted differently. The integration of HSE showed to improve previously obtained results. Lastly, I present an attempt to predict the HIV-1 Protease specificity. As the protease is essential for the life cycle of the HIV virus, the protease is of great interest as an target for the rational design of drugs against HIV. We show that it is possible to predict the specificity of the HIV protease with a high performance. In the process we also identified new possible cleavage sites which will further be verified experimentally in the lab. In summary, the thesis presented in this work has greatly contributed to the development of new tools in bioinformatics that will hopefully aid in future scientific discoveries.

[1]  Dmitrij Frishman,et al.  Prediction of beta-turns and beta-turn types by a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN). , 2008, Gene.

[2]  Yaoqi Zhou,et al.  Real‐SPINE: An integrated system of neural networks for real‐value prediction of protein structural properties , 2007, Proteins.

[3]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[4]  Kuo-Chen Chou,et al.  HIVcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. , 2008, Analytical biochemistry.

[5]  William J Welsh,et al.  Improved method for predicting beta-turn using support vector machine. , 2005, Bioinformatics.

[6]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[7]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[8]  Kevin Burrage,et al.  Prediction of protein solvent accessibility using support vector machines , 2002, Proteins.

[9]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[10]  Thorsteinn S. Rögnvaldsson,et al.  Why neural networks should not be used for HIV-1 protease cleavage site prediction , 2004, Bioinform..

[11]  R. Benz,et al.  Structural determinants for membrane insertion, pore formation and translocation of Clostridium difficile toxin B , 2011, Molecular microbiology.

[12]  Jan Komorowski,et al.  Computational proteomics analysis of HIV‐1 protease interactome , 2007, Proteins.

[13]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[14]  Morten Nielsen,et al.  CPHmodels-3.0—remote homology modeling using structure-guided sequence profiles , 2010, Nucleic Acids Res..

[15]  D Gorse,et al.  Prediction of the location and type of β‐turns in proteins using neural networks , 1999, Protein science : a publication of the Protein Society.

[16]  Jung-Ying Wang,et al.  SVM‐Cabins: Prediction of solvent accessibility using accumulation cutoff set and support vector machine , 2007, Proteins.

[17]  Maureen M Goodenow,et al.  Analysis of HIV-1 CRF_01 A/E protease inhibitor resistance: structural determinants for maintaining sensitivity and developing resistance to atazanavir. , 2006, Biochemistry.

[18]  Morten Nielsen,et al.  Modeling the adaptive immune system: predictions and simulations , 2007, Bioinform..

[19]  G. Shaw,et al.  Molecular cloning and characterization of the HTLV-III virus associated with AIDS , 1984, Nature.

[20]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[21]  Aleksey A. Porollo,et al.  Accurate prediction of solvent accessibility using neural networks–based regression , 2004, Proteins.

[22]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[23]  Michael K. Gilson,et al.  Evaluating the Substrate-Envelope Hypothesis: Structural Analysis of Novel HIV-1 Protease Inhibitors Designed To Be Robust against Drug Resistance , 2010, Journal of Virology.

[24]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[25]  Menglong Li,et al.  Prediction of Beta-Turn in Protein Using E-SSpred and Support Vector Machine , 2009, The protein journal.

[26]  S. Brunak,et al.  Protein annotation in the era of personal genomics. , 2010, Current opinion in structural biology.

[27]  Isabelle Richard,et al.  A new pathway encompassing calpain 3 and its newly identified substrate cardiac ankyrin repeat protein is involved in the regulation of the nuclear factor‐κB pathway in skeletal muscle , 2010, The FEBS journal.

[28]  P. Ettmayer,et al.  Structural and conformational requirements for high-affinity binding to the SH2 domain of Grb2(1). , 1999, Journal of medicinal chemistry.

[29]  S. Henikoff,et al.  Position-based sequence weights. , 1994, Journal of molecular biology.

[30]  Jens Meiler,et al.  Algorithm for selection of optimized EPR distance restraints for de novo protein structure determination. , 2011, Journal of structural biology.

[31]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[32]  G. Rose,et al.  Turns in peptides and proteins. , 1985, Advances in protein chemistry.

[33]  A. Wensing,et al.  Fifteen years of HIV Protease Inhibitors: raising the barrier to resistance. , 2010, Antiviral research.

[34]  Wenyaw Chan,et al.  Statistical Methods in Medical Research , 2013, Model. Assist. Stat. Appl..

[35]  Allegra Via,et al.  A structure filter for the Eukaryotic Linear Motif Resource , 2009, BMC Bioinformatics.

[36]  F. W. Outten,et al.  Fur and the Novel Regulator YqjI Control Transcription of the Ferric Reductase Gene yqjH in Escherichia coli , 2010, Journal of bacteriology.

[37]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[38]  Claus Lundegaard,et al.  NetTurnP – Neural Network Prediction of Beta-turns by Use of Evolutionary Information and Predicted Protein Sequence Features , 2010, PloS one.

[39]  Tu Bao Ho,et al.  Prediction and analysis of beta-turns in proteins by support vector machine. , 2003, Genome informatics. International Conference on Genome Informatics.

[40]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[41]  C. Venkatachalam Stereochemical criteria for polypeptides and proteins. VI. Non-bonded energy of polyglycine and poly-L-alanine in the crystalline beta-form. , 1968, Biochimica et biophysica acta.

[42]  Ron Poet,et al.  Loops, bulges, turns and hairpins in proteins , 1987 .

[43]  Harald R. Gruber-Vodicka,et al.  Sequence variability of the pattern recognition receptor Mermaid mediates specificity of marine nematode symbioses , 2011, The ISME Journal.

[44]  Sean D. Mooney,et al.  Bioinformatics approaches and resources for single nucleotide polymorphism functional analysis , 2005, Briefings Bioinform..

[45]  Alessandro Vullo,et al.  Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information , 2007, BMC Bioinformatics.

[46]  Morten Nielsen,et al.  Immunological bioinformatics , 2005, Computational molecular biology.

[47]  K. Guruprasad,et al.  Beta-and gamma-turns in proteins revisited: a new set of amino acid turn-type dependent positional preferences and potentials. , 2000, Journal of biosciences.

[48]  O. Sánchez,et al.  Gene cloning and enzyme structure modeling of the Aspergillus oryzae N74 fructosyltransferase , 2011, Molecular Biology Reports.

[49]  G. Fasman Prediction of Protein Structure and the Principles of Protein Conformation , 2012, Springer US.

[50]  Xiao-kui Guo,et al.  Cloning, expression and immunological evaluation of a short fragment from Rv3391 of Mycobacterium tuberculosis , 2011, Annals of Microbiology.

[51]  Morten Nielsen,et al.  NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11 , 2008, Nucleic Acids Res..

[52]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[53]  R. Freire,et al.  The Hus1 homologue of Leishmania major encodes a nuclear protein that participates in DNA damage response. , 2011, Molecular and biochemical parasitology.

[54]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[55]  Chun-Ting Zhang,et al.  Prediction of β‐turns in proteins by 1‐4 and 2‐3 correlation model , 1997 .

[56]  V. Krchňák,et al.  Computer prediction of potential immunogenic determinants from protein amino acid sequence. , 1987, Analytical biochemistry.

[57]  Gajendra P. S. Raghava,et al.  A neural network method for prediction of ?-turn types in proteins using evolutionary information , 2004, Bioinform..

[58]  Yaoqi Zhou,et al.  QBES: Predicting real values of solvent accessibility from sequences by efficient, constrained energy optimization , 2006, Proteins.

[59]  Gary D. Stormo,et al.  Displaying the information contents of structural RNA alignments: the structure logos , 1997, Comput. Appl. Biosci..

[60]  C. Chothia The nature of the accessible and buried surfaces in proteins. , 1976, Journal of molecular biology.

[61]  A. Elofsson,et al.  Structure is three to ten times more conserved than sequence—A study of structural response in protein cores , 2009, Proteins.

[62]  Lukasz A. Kurgan,et al.  Prediction of beta-turns at over 80% accuracy based on an ensemble of predicted secondary structures and multiple alignments , 2008, BMC Bioinformatics.

[63]  O. Lund,et al.  Prediction of residues in discontinuous B‐cell epitopes using protein 3D structures , 2006, Protein science : a publication of the Protein Society.

[64]  M. Levitt,et al.  Normal modes of prion proteins: from native to infectious particle. , 2011, Biochemistry.

[65]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Gajendra PS Raghava,et al.  Identification of conformational B-cell Epitopes in an antigen from its primary sequence , 2010, Immunome research.

[67]  Arno Lukas,et al.  Identification of discontinuous antigenic determinants on proteins based on shape complementarities , 2007, Journal of molecular recognition : JMR.

[68]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[69]  G P S Raghava,et al.  An evaluation of beta-turn prediction methods. , 2002, Bioinformatics.

[70]  Y. Guisez,et al.  Dipeptidyl peptidase 9 (DPP9) from bovine testes: identification and characterization as the short form by mass spectrometry. , 2010, Biochimica et biophysica acta.

[71]  A. Berger,et al.  On the size of the active site in proteases. I. Papain. , 1967, Biochemical and biophysical research communications.

[72]  Zheng Yuan,et al.  Prediction of protein accessible surface areas by support vector regression , 2004, Proteins.

[73]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[74]  S. Saleem,et al.  Envelope 2 protein phosphorylation sites S75 & 277 of hepatitis C virus genotype 1a and interferon resistance: A sequence alignment approach , 2011, Virology Journal.

[75]  Jeon-Soo Shin,et al.  Venom peptides from solitary hunting wasps induce feeding disorder in lepidopteran larvae , 2011, Peptides.

[76]  J. Thompson,et al.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. , 1997, Nucleic acids research.

[77]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[78]  L. Jensen,et al.  Mass Spectrometric Analysis of Lysine Ubiquitylation Reveals Promiscuity at Site Level* , 2010, Molecular & Cellular Proteomics.

[79]  Urmila Kulkarni-Kale,et al.  CEP: a conformational epitope prediction server , 2005, Nucleic Acids Res..

[80]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[81]  J. Thornton,et al.  A revised set of potentials for beta-turn formation in proteins. , 1994, Protein science : a publication of the Protein Society.

[82]  K. Chou,et al.  Prediction of beta-turns. , 1979, Journal of protein chemistry.

[83]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[84]  P A Fernandes,et al.  Substrate recognition in HIV-1 protease: a computational study. , 2010, The journal of physical chemistry. B.

[85]  L. Sharpe,et al.  Akt Phosphorylates Sec24: New Clues into the Regulation of ER‐to‐Golgi Trafficking , 2011, Traffic.

[86]  Alireza Meshkin,et al.  Prediction of relative solvent accessibility by support vector regression and best-first method , 2010, EXCLI journal.

[87]  M. L. Connolly Analytical molecular surface calculation , 1983 .

[88]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[89]  S. Heath,et al.  Two families confirm Schöpf‐Schulz‐Passarge syndrome as a discrete entity within the WNT10A phenotypic spectrum , 2011, Clinical genetics.

[90]  S. Jones,et al.  Analysis of protein-protein interaction sites using surface patches. , 1997, Journal of molecular biology.

[91]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[92]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[93]  S. Strelkov,et al.  History and phylogeny of intermediate filaments: Now in insects , 2011, BMC Biology.

[94]  M. Nielsen,et al.  Structural Properties of MHC Class II Ligands, Implications for the Prediction of MHC Class II Epitopes , 2010, PloS one.

[95]  E Westhof,et al.  Predicting location of continuous epitopes in proteins from their primary structures. , 1991, Methods in enzymology.

[96]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[97]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[98]  Jiangning Song,et al.  HSEpred: predict half-sphere exposure from protein sequences , 2008, Bioinform..

[99]  M. V. Van Regenmortel,et al.  Predicting antigenic determinants in proteins: looking for unidimensional solutions to a three-dimensional problem? , 1994, Peptide research.

[100]  Erwin L Roggen,et al.  An in silico method using an epitope motif database for predicting the location of antigenic determinants on proteins in a structural context , 2006, Journal of molecular recognition : JMR.

[101]  C. McGinness Characterization and Evolution of the SerH Immobilization Antigen Genes in TETRAHYMENA THERMOPHILA , 2010 .

[102]  F. Shahidi,et al.  Cloning, expression, characterization, and computational approach for cross-reactivity prediction of manganese superoxide dismutase allergen from pistachio nut. , 2010, Allergology international : official journal of the Japanese Society of Allergology.

[103]  A. Panchenko,et al.  Prediction of functional sites by analysis of sequence and structure conservation , 2004, Protein science : a publication of the Protein Society.

[104]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[105]  J. Mccammon,et al.  HIV‐1 protease molecular dynamics of a wild‐type and of the V82F/I84V mutant: Possible contributions to drug resistance and a potential new target site for drugs , 2004, Protein science : a publication of the Protein Society.

[106]  Saejoon Kim Protein beta-turn prediction using nearest-neighbor method. , 2004, Bioinformatics.

[107]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[108]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[109]  Jeffrey J. Gray,et al.  Identification of structural mechanisms of HIV-1 protease specificity using computational peptide docking: implications for drug resistance. , 2009, Structure.

[110]  J. León,et al.  In vivo protein tyrosine nitration in Arabidopsis thaliana , 2011, Journal of experimental botany.

[111]  S. Weng,et al.  Molecular cloning of IKKβ from the mandarin fish Siniperca chuatsi and its up-regulation in cells by ISKNV infection. , 2011, Veterinary immunology and immunopathology.

[112]  K. Chou,et al.  Neural network prediction of the HIV-1 protease cleavage sites. , 1995, Journal of theoretical biology.

[113]  K C Chou,et al.  Artificial neural network model for predicting HIV protease cleavage sites in protein , 1998 .

[114]  I. Rasooli,et al.  In silico analysis of antibody triggering biofilm associated protein in Acinetobacter baumannii. , 2010, Journal of theoretical biology.

[115]  T. D. Schneider,et al.  Sequence logos: a new way to display consensus sequences. , 1990, Nucleic acids research.

[116]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[117]  T. P. Flores,et al.  Prediction of beta-turns in proteins using neural networks. , 1989, Protein engineering.

[118]  A. Alix,et al.  Predictive estimation of protein linear epitopes by using the program PEOPLE. , 1999, Vaccine.

[119]  Harpreet Kaur,et al.  Real value prediction of solvent accessibility in proteins using multiple sequence alignment and secondary structure , 2005, Proteins.

[120]  A. Ingale Antigenic epitopes prediction and MHC binder of a paralytic insecticidal toxin (ITX-1) of Tegenaria agrestis (hobo spider) , 2010 .

[121]  A. Hinck,et al.  Peptide ligands that use a novel binding site to target both TGF-β receptors. , 2010, Molecular bioSystems.

[122]  I. Meier,et al.  Targeting proteins to the plant nuclear envelope. , 2010, Biochemical Society transactions.

[123]  M. Castori,et al.  AXIN2 germline mutations are rare in familial melanoma , 2011, Genes, chromosomes & cancer.

[124]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[125]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[126]  Jagath C Rajapakse,et al.  Two‐stage support vector regression approach for predicting accessible surface areas of amino acids , 2006, Proteins.

[127]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[128]  A. Alix,et al.  High accuracy prediction of β‐turns and their types using propensities and multiple alignments , 2005 .

[129]  P. Vogt,et al.  Cancer-derived mutations in the regulatory subunit p85α of phosphoinositide 3-kinase function through the catalytic subunit p110α , 2010, Proceedings of the National Academy of Sciences.

[130]  Gajendra Pal Singh Raghava,et al.  Prediction of β‐turns in proteins from multiple alignment using neural network , 2003, Protein science : a publication of the Protein Society.

[131]  Jonathan D. Hirst,et al.  Predicting β-turns and their types using predicted backbone dihedral angles and secondary structures , 2010, BMC Bioinformatics.

[132]  I. Haworth,et al.  Functional Role of the Intracellular Loop Linking Transmembrane Domains 6 and 7 of the Human Dipeptide Transporter hPEPT1 , 2010, The Journal of Membrane Biology.

[133]  Xiuzhen Hu,et al.  Using support vector machine to predict β‐ and γ‐turns in proteins , 2008, J. Comput. Chem..

[134]  A. Magalon,et al.  Heme biosynthesis is coupled to electron transport chains for energy generation , 2010, Proceedings of the National Academy of Sciences.

[135]  O. Carugo,et al.  Predicting residue solvent accessibility from protein sequence by considering the sequence environment. , 2000, Protein engineering.

[136]  Pierre Baldi,et al.  PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure , 2008, Bioinform..

[137]  O. Lund,et al.  Prediction of protein secondary structure at 80% accuracy , 2000, Proteins.

[138]  N. Kedishvili,et al.  Evidence that proteosome inhibitors and chemical chaperones can rescue the activity of retinol dehydrogenase 12 mutant T49M. , 2011, Chemico-biological interactions.

[139]  Wei-Chi Ku,et al.  S-alkylating labeling strategy for site-specific identification of the s-nitrosoproteome. , 2010, Journal of proteome research.

[140]  Yaoqi Zhou,et al.  Achieving 80% ten‐fold cross‐validated accuracy for secondary structure prediction by large‐scale training , 2006, Proteins.