STAR: predicting recombination sites from amino acid sequence

BackgroundDesigning novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone.ResultsWe present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89).ConclusionSTAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from http://pprowler.itee.uq.edu.au/star.

[1]  W. Stemmer,et al.  DNA shuffling of a family of genes from diverse species accelerates directed evolution , 1998, Nature.

[2]  S. Sundararajan,et al.  Predictive Approaches for Choosing Hyperparameters in Gaussian Processes , 1999, Neural Computation.

[3]  Christopher A. Voigt,et al.  Protein building blocks preserved by recombination , 2002, Nature Structural Biology.

[4]  Piero Fariselli,et al.  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure , 2005, Nucleic Acids Res..

[5]  Christopher A. Voigt,et al.  Functional evolution and structural conservation in chimeric cytochromes p450: calibrating a structure-guided approach. , 2004, Chemistry & biology.

[6]  Stephen J Benkovic,et al.  FamClash: A method for ranking the activity of engineered enzymes , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[7]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[8]  S. Benkovic,et al.  Rapid generation of incremental truncation libraries for protein engineering using alpha-phosphothioate nucleotides. , 2001, Nucleic acids research.

[9]  C. Wilke,et al.  On the conservative nature of intragenic recombination. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Costas D Maranas,et al.  Design of combinatorial protein libraries of optimal size , 2005, Proteins.

[11]  Mikael Bodén,et al.  Prediction of protein continuum secondary structure with probabilistic models based on NMR solved structures , 2006, BMC Bioinformatics.

[12]  Marc Ostermeier,et al.  A combinatorial approach to hybrid enzymes independent of DNA homology , 1999, Nature Biotechnology.

[13]  Zheng Yuan,et al.  Better prediction of protein contact number using a support vector regression analysis of amino acid sequence , 2005, BMC Bioinformatics.

[14]  Frances H Arnold,et al.  General method for sequence-independent site-directed chimeragenesis. , 2003, Journal of molecular biology.

[15]  Frances H Arnold,et al.  Library analysis of SCHEMA‐guided protein recombination , 2003, Protein science : a publication of the Protein Society.

[16]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[17]  Piero Fariselli,et al.  ConSeq: the identification of functionally and structurally important residues in protein sequences , 2004, Bioinform..

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  Frances H Arnold,et al.  To whom correspondence should be addressed. , 2022 .

[20]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[21]  Giovanni Soda,et al.  Exploiting the past and the future in protein secondary structure prediction , 1999, Bioinform..

[22]  Costas D Maranas,et al.  Using a residue clash map to functionally characterize protein recombination hybrids. , 2003, Protein engineering.

[23]  S. Hua,et al.  A novel method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. , 2001, Journal of molecular biology.

[24]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.