PSSP: Protein splice site prediction algorithm using Bayesian approach

This study aimed to introduce an algorithm and identify intein motif and blocks involved in protein splicing, and explore the underlying methods in the development of detection of protein motifs. Inteins are mobile protein splicing elements capable of self-splicing post-translationally. They exist in viruses and bacteriophage, notwithstanding this broad phylogenetic distribution, all inteins apportion common structural features. A method was developed to predict intein in a raw sequence, using a ranking and scoring scheme based on amino acid θ value tables. This method aided in the identification and assessment of patterns characterizing the intein sequences. New intein conserved properties are revealed and the known ones are described and localized. We have computed the θ value of each amino acid at block A positions +1 to +13, block B positions l+13 to l+26 and block G positions -7 to +1 for the three categories. The consensus amino acids thus found are listed at the end of each row. We gave statistics for the distance between the blocks, block A to B, block B to F, and block F to G with the average being 66.1, 294, and 10.2 amino acids, respectively. The actual blocks A, B, and G of the one intein found in vacuolar membrane ATPase subunit, a precursor protein, are ranked 1. The results indicate all of the block sequences that are found in nine proteins are ranked at top of the list. The intein sequence is used to search the databases for intein-like proteins. Understanding the functional, structural, and dynamical aspects of inteins is important for intein engineering and the betterment of intein database.

[1]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[2]  David T. Jones,et al.  Protein topology from predicted residue contacts , 2012, Protein science : a publication of the Protein Society.

[3]  H. Paulus,et al.  Protein splicing and related forms of protein autoprocessing. , 2000, Annual review of biochemistry.

[4]  Marlene Belfort,et al.  Enigmatic Distribution, Evolution, and Function of Inteins* , 2014, The Journal of Biological Chemistry.

[5]  Laerte Oliveira,et al.  Identification of functionally conserved residues with the use of entropy–variability plots , 2003, Proteins.

[6]  Marlene Belfort,et al.  Structure of catalytically competent intein caught in a redox trap with functional and evolutionary implications , 2011, Nature Structural &Molecular Biology.

[7]  J. Gogarten,et al.  Distribution and evolution of the mobile vma-1b intein. , 2013, Molecular biology and evolution.

[8]  Marvin B. Shapiro,et al.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression. , 1987, Nucleic acids research.

[9]  T. Muir,et al.  Biological Applications of Protein Splicing , 2010, Cell.

[10]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[11]  C. Wallace The curious case of protein splicing: Mechanistic insights suggested by protein semisynthesis , 1993, Protein science : a publication of the Protein Society.

[12]  Francine B. Perler,et al.  InBase: the Intein Database , 2002, Nucleic Acids Res..

[13]  P. Kollman,et al.  Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution. , 1998, Science.

[14]  Sasmita Nayak,et al.  SufB intein of Mycobacterium tuberculosis as a sensor for oxidative and nitrosative stresses , 2015, Proceedings of the National Academy of Sciences.

[15]  Shmuel Pietrokovski,et al.  Splicing of the Mycobacteriophage Bethlehem DnaB Intein , 2009, The Journal of Biological Chemistry.

[16]  Shmuel Pietrokovski,et al.  Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations , 1999, Bioinform..

[17]  G. Vriend,et al.  Prediction of protein residue contacts with a PDB-derived likelihood matrix. , 2002, Protein engineering.

[18]  R. Hirata,et al.  Molecular structure of a gene, VMA1, encoding the catalytic subunit of H(+)-translocating adenosine triphosphatase from vacuolar membranes of Saccharomyces cerevisiae. , 1990, The Journal of biological chemistry.

[19]  F. Perler,et al.  Protein splicing in cis and in trans. , 2006, Chemical record.

[20]  J. Forwood,et al.  Mutability Dynamics of an Emergent Single Stranded DNA Virus in a Naïve Host , 2014, PloS one.

[21]  S. Pietrokovski Intein spread and extinction in evolution. , 2001, Trends in genetics : TIG.

[22]  G Vriend,et al.  Correlated Mutation Analyses on Very Large Sequence Families , 2002, Chembiochem : a European journal of chemical biology.

[23]  Achim Kramer,et al.  Mapping of phosphorylation sites by a multi-protease approach with specific phosphopeptide enrichment and NanoLC-MS/MS analysis. , 2005, Analytical chemistry.

[24]  S. Feinstein,et al.  A dominant trifluoperazine resistance gene from Saccharomyces cerevisiae has homology with F0F1 ATP synthase and confers calcium-sensitive growth , 1988, Molecular and cellular biology.

[25]  P. Kane,et al.  Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H(+)-adenosine triphosphatase. , 1990, Science.

[26]  S. Pietrokovski,et al.  A pair‐to‐pair amino acids substitution matrix and its applications for protein structure prediction , 2007, Proteins.

[27]  Robert P Bywater,et al.  Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data , 2016, PloS one.

[28]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[29]  Joseph A. Bank,et al.  Supporting Online Material Materials and Methods Figs. S1 to S10 Table S1 References Movies S1 to S3 Atomic-level Characterization of the Structural Dynamics of Proteins , 2022 .

[30]  William R Taylor,et al.  Prediction of contacts from correlated sequence substitutions. , 2013, Current opinion in structural biology.

[31]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[32]  R. Bywater,et al.  Accelerated simulation of unfolding and refolding of a large single chain globular protein , 2012, Open Biology.

[33]  B. Rost,et al.  Improved prediction of protein secondary structure by use of sequence profiles and neural networks. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[34]  F. Perler,et al.  Protein splicing removes intervening sequences in an archaea DNA polymerase. , 1992, Nucleic acids research.

[35]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[36]  Marlene Belfort,et al.  Post-translational environmental switch of RadA activity by extein–intein interactions in protein splicing , 2015, Nucleic acids research.

[37]  G J Olsen,et al.  Compilation and analysis of intein sequences. , 1997, Nucleic acids research.

[38]  S. Pietrokovski,et al.  Conserved sequence features of inteins (protein introns) and their use in identifying new inteins and related proteins , 1994, Protein science : a publication of the Protein Society.

[39]  Natalya I Topilina,et al.  Recent advances in in vivo applications of intein-mediated protein splicing , 2013, Mobile DNA.

[40]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.