Search and Analysis of the Sequence Space of a Protein Using Computational Tools

The application of enzymes as catalysts for industrial processes spawned what is now a rapidly growing field of biocatalysis. Numerous enzymes have been found and characterized according to their functions and/or three-dimensional structures. Directed Evolution (DE) is a field of research in biocatalysis, where mutations are made in the sequence of a native or what can be called a wild-type enzyme. These mutations are made at random, with the purpose of finding an alternate sequence or a variant to the wild-type enzyme, which shows an improvement over it for a specific property. The sequence space of an enzyme refers to all the possible variants, which can be created from it. Due to an immensely large number of such sequences, making mutations at random is definitely not an optimal strategy. However, due to the absence of a proper understanding of how the sequence of a protein translates into its function, or a sequence-to-function map, DE is usually the only available option. In this thesis, a computational approach to improving the process of DE is presented, which involves using machine learning algorithms. When any enzyme is subjugated to DE, a large number of its variants are created, which are analyzed through high-throughput screening methods. The screening results provide us the measure of the property of interest, like catalytic activity towards a specific reaction, for each of these variants. This data can be utilized to search for patterns in the sequence space, which can lead us to an understanding of how the function is related to an enzyme's sequence. However, the critical limitation to this approach is the scarcity to data because sequencing the created variants is a sizeable task and only a relatively small number can be expected to be available. Most machine learning methods, on the other hand, usually required a large number of examples in the data set. To circumvent this constraint, a simplifying assumption was made, whereby, each variant was divided into two classes---positive and negative. This criteria for this division can be selected based on the measured property of interest for the variant, according to the screening method. Such an assumption reduces the problem to a case of non-linear classification into binary classes. Efficient algorithms have been developed for such problems, which may be able to give pertinent results from the available data. Chapters 1 and 2 of this thesis introduce the basic concepts of protein engineering and Directed Evolution. The suggested approach of using machine learning to analyze the sequence space of any protein or enzyme is described. Chapter 2 also provides background information on the different experimental procedures, which are performed during a DE process. It also mentions the research done in the field of applying different computational strategies to improve DE. Background is provided on the different machine learning algorithms, which were used with the data available from DE. Support Vector Machines (SVMs) are a recently developed class of algorithms, which are primarily used for non-linear classification. An SVM was formulated to identify important amino acids in the sequence of a protein, which is described in Chapter 3. An important amino acid residue was defined as one, which, if mutated, will result in an inactive variant. The data used were the variant sequences containing random mutations created by using different protocols of DE. Based on their screening, they were classified as positive, if they had any catalytic activity, or negative, if they had none. This algorithm was applied to the TEM-1 β-lactamase sequence. The reason for this choice was the availability of known significant amino acid residues for the TEM-1 β-lactamase sequence, which were found through extensive experiments. In silico or computer-generated variants were created by simulating the different protocols of DE. It was shown that the SVM can efficiently identify such residues from relatively small number (of the order of 102) of variant sequences. Chapter 4 extends the framework described in Chapter 3 to identify pairs of amino acids, which interact with each other. Only interactions that contribute significantly to the function or structural stability of the protein need to be identified. Boolean Learning, which is another class of machine learning problems, was used for this purpose. The specific algorithm used was called OCAT (One Clause At a Time). The data assumed in this case were the variant sequence created specifically from the recombinant protocols (described in Chapter 2) of DE. The definition for positive and negative variants were catalytically active or inactive respectively. It was shown through simulations that the OCAT algorithm can identify the interacting pairs in the sequence. This result was also shown to be independent of where the individual amino acids, involved in the interaction, exist in the primary sequence of the protein. A novel way of combining the OCAT algorithm with SVMs was also introduced. It was shown that the results obtained for the combination of these two algorithms, which was referred to as BLSVM, were better than OCAT alone. It was also suggested that the results obtained by using these algorithms on the variants generated in one round of DE can help improve the fraction of active or positive variants in the subsequent rounds. This was done by not permitting any mutations to occur in the amino acids, which interact with each other. To show the applicability of the algorithm described in Chapter 4, an experimental study is presented in Chapter 5. Two fluorescent proteins---mRFP and DsRed---were subjected to different recombinant protocols and 83 unique variants were generated. The positive variants were defined as ones that fluoresce with at least 10% of the intensity of the two parent enzymes. Using the OCAT algorithm on these sequences, a pair of possibly interacting residues was identified. The three-dimensional structure of DsRed showed that these two residues were very close to each other and were also close to the chromophore, which imparts the fluorescence to the two parent enzymes. The interaction between these residues was experimentally confirmed with the help of point mutations. The claim that this result can help increase the fraction active sequences in the variants of subsequent rounds, was also justified by doing further recombinant experiments where the two identified amino acids were not allowed to be mutated. The results showed close to 20% improvement in the fraction positive variants when compared with the recombinant variants of the two parent sequences. On analyzing the variants created in Chapter 5, it was observed that often the screening procedure can give erroneous results. Thus, there is a finite probability of misclassifying a variant, which is known as classification error of classification noise. Unfortunately, the OCAT algorithm, which was used in Chapters 4 and 5 is highly sensitive to classification noise and does not perform well. Thus, an alternate algorithm was presented in Chapter 6, which was derived from OCAT but is tolerant to classification noise. This algorithm, which was termed Modified OCAT or mOCAT, was theoretically analyzed. Using the PAC learning theory for Boolean Learning, it was shown that mOCAT is efficient and can be used for sample sets with classification noise less than 50%. An expression was developed for the number of examples required, which will ensure that the learning is going to be accurate. mOCAT was simulated for the same problem, which was described in Chapter 4. It was shown that in the presence of classification noise, mOCAT can outperform, not only OCAT, but also another existing algorithm. In conclusion, Chapter 7 provides the significant findings and overall achievements of this thesis. The approach of identifying indivi

[1]  Nigel F. Delaney,et al.  Darwinian Evolution Can Follow Only Very Few Mutational Paths to Fitter Proteins , 2006, Science.

[2]  Frances H. Arnold,et al.  Directed evolution of a para-nitrobenzyl esterase for aqueous-organic solvents , 1996, Nature Biotechnology.

[3]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[4]  U. Bornscheuer,et al.  Improved biocatalysts by directed evolution and rational protein design. , 2001, Current opinion in chemical biology.

[5]  Jay H. Lee,et al.  Identifying the interacting positions of a protein using Boolean learning and support vector machines , 2006, Comput. Biol. Chem..

[6]  Chris Aldrich,et al.  Improving process operations using support vector machines and decision trees , 2005 .

[7]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[8]  L. Looger,et al.  Computational design of receptor and sensor proteins with novel functions , 2003, Nature.

[9]  R. Tsien,et al.  A monomeric red fluorescent protein , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  D. Hilvert Critical analysis of antibody catalysis. , 2000, Annual review of biochemistry.

[11]  R. Brown,et al.  Novel screening methods--the key to cloning commercially successful biocatalysts. , 1999, Bioorganic & medicinal chemistry.

[12]  S. Benkovic,et al.  Homology-independent protein engineering. , 2000, Current opinion in biotechnology.

[13]  J. Reymond,et al.  Novel methods for biocatalyst screening. , 2001, Current opinion in chemical biology.

[14]  B. Derrida,et al.  Evolution in a flat fitness landscape , 1991 .

[15]  J. Leunissen,et al.  Homology modelling and protein engineering strategy of subtilases, the family of subtilisin-like serine proteinases. , 1991, Protein engineering.

[16]  V. Verkhusha,et al.  The molecular properties and applications of Anthozoa fluorescent proteins and chromoproteins , 2004, Nature Biotechnology.

[17]  F. Arnold,et al.  Directed evolution study of temperature adaptation in a psychrophilic enzyme. , 2000, Journal of molecular biology.

[18]  Exon Inser,et al.  PCR-Based Random Mutagenesis Using Manganese and Reduced dNTP Concentration , 1997 .

[19]  Wim Hordijk,et al.  A Measure of Landscapes , 1996, Evolutionary Computation.

[20]  Philip T. Pienkos,et al.  Growth factor engineering by degenerate homoduplex gene family recombination , 2002, Nature Biotechnology.

[21]  Fengzhu Sun Modeling DNA Shuffling , 1999, J. Comput. Biol..

[22]  Thomas D. Y. Chung,et al.  A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays , 1999, Journal of biomolecular screening.

[23]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[24]  K. Goh,et al.  Lethality and synthetic lethality in the genome-wide metabolic network of Escherichia coli. , 2004, Journal of theoretical biology.

[25]  D. Baker,et al.  Design of a Novel Globular Protein Fold with Atomic-Level Accuracy , 2003, Science.

[26]  G. Georgiou,et al.  Quantitative analysis of the effect of the mutation frequency on the affinity maturation of single chain Fv antibodies. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Loren L Looger,et al.  Computational Design of a Biologically Active Enzyme , 2004, Science.

[28]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[29]  Sung-Hun Nam,et al.  Design and Evolution of New Catalytic Activity with an Existing Protein Scaffold , 2006, Science.

[30]  D. Meldrum,et al.  Automation for genomics, part one: preparation for sequencing. , 2000, Genome research.

[31]  Y Husimi,et al.  Analysis of a local fitness landscape with a model of the rough Mt. Fuji-type landscape: application to prolyl endopeptidase and thermolysin. , 2000, Biopolymers.

[32]  D. Axe,et al.  Extreme functional sensitivity to conservative amino acid changes on enzyme exteriors. , 2000, Journal of molecular biology.

[33]  Cameron Neylon,et al.  Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution. , 2004, Nucleic acids research.

[34]  M J Sternberg,et al.  Model building by comparison at CASP3: Using expert knowledge and computer automation , 1999, Proteins.

[35]  David Baker,et al.  Protein structure prediction and analysis using the Robetta server , 2004, Nucleic Acids Res..

[36]  A. Prügel-Bennett Modelling evolving populations. , 1997, Journal of theoretical biology.

[37]  Yuzuru Husimi,et al.  Adaptive walks by the fittest among finite random mutants on a Mt. Fuji-type fitness landscape II. Effect of small non-additivity , 2000, Journal of mathematical biology.

[38]  A. Elcock Prediction of functionally important residues based solely on the computed energetics of protein structure. , 2001, Journal of molecular biology.

[39]  S K Patnaik,et al.  Use of on-line tools and databases for routine sequence analyses. , 2001, Analytical biochemistry.

[40]  P. Balbás Understanding the art of producing protein and nonprotein molecules in Escherichia coli , 2001, Molecular biotechnology.

[41]  Frances H Arnold,et al.  Analysis of shuffled gene libraries. , 2002, Journal of molecular biology.

[42]  Volker Sieber,et al.  Libraries of hybrid proteins from distantly related sequences , 2001, Nature Biotechnology.

[43]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[44]  E. T RIANTAPHYLLOU,et al.  A Greedy Randomized Adaptive Search Procedure ( GRASP ) for Inferring Logical Clauses from Examples in Polynomial Time and some Extensions , 1998 .

[45]  Costas D Maranas,et al.  Using a residue clash map to functionally characterize protein recombination hybrids. , 2003, Protein engineering.

[46]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[47]  M J Sternberg,et al.  Enhancement of protein modeling by human intervention in applying the automatic programs 3D‐JIGSAW and 3D‐PSSM , 2001, Proteins.

[48]  Bart Naudts,et al.  A comparison of predictive measures of problem difficulty in evolutionary algorithms , 2000, IEEE Trans. Evol. Comput..

[49]  A. Poteete,et al.  Delineation of an evolutionary salvage pathway by compensatory mutations of a defective lysozyme , 1998, Protein science : a publication of the Protein Society.

[50]  T. Wichelhaus,et al.  Compensatory Adaptation to the Loss of Biological Fitness Associated with Acquisition of Fusidic Acid Resistance in Staphylococcus aureus , 2005, Antimicrobial Agents and Chemotherapy.

[51]  Thomas Bäck,et al.  Evolutionary Algorithms: The Role of Mutation and Recombination , 2000 .

[52]  Evangelos Triantaphyllou Inference of a minimum size boolean function from examples by using a new efficient branch-and-bound approach , 1994, J. Glob. Optim..

[53]  G. F. Joyce,et al.  Randomization of genes by PCR mutagenesis. , 1992, PCR methods and applications.

[54]  C D Maranas,et al.  Creating multiple-crossover DNA libraries independent of sequence identity , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[55]  W. Stemmer,et al.  DNA shuffling of a family of genes from diverse species accelerates directed evolution , 1998, Nature.

[56]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[57]  Karen A. F. Copeland Experiments: Planning, Analysis, and Parameter Design Optimization , 2002 .

[58]  P Soumillion,et al.  Novel concepts for selection of catalytic activity. , 2001, Current opinion in biotechnology.

[59]  C D Maranas,et al.  Modeling DNA mutation and recombination for directed evolution experiments. , 2000, Journal of theoretical biology.

[60]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[61]  S. Govindarajan,et al.  Advances in directed protein evolution by recursive genetic recombination: applications to therapeutic proteins. , 2001, Current opinion in biotechnology.

[62]  Vladislav V Verkhusha,et al.  Conversion of the monomeric red fluorescent protein into a photoactivatable probe. , 2005, Chemistry & biology.

[63]  Jason A. Papin,et al.  Analysis of metabolic capabilities using singular value decomposition of extreme pathway matrices. , 2003, Biophysical journal.

[64]  Jianhua Chen,et al.  An incremental learning algorithm for constructing Boolean functions from positive and negative examples , 2002, Comput. Oper. Res..

[65]  J. Tainer,et al.  Mechanism and energetics of green fluorescent protein chromophore synthesis revealed by trapped intermediate structures , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[66]  P. Higgs,et al.  Population evolution on a multiplicative single-peak fitness landscape. , 1996, Journal of theoretical biology.

[67]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[68]  Patrice Courvalin,et al.  Stability of TEM β‐lactamase mutants hydrolyzing third generation cephalosporins , 1995 .

[69]  P R A Campos,et al.  Finite-size scaling of the quasispecies model , 1998 .

[70]  Jason A. Papin,et al.  Metabolic pathways in the post-genome era. , 2003, Trends in biochemical sciences.

[71]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[72]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[73]  C. Metz ROC Methodology in Radiologic Imaging , 1986, Investigative radiology.

[74]  R Y Tsien,et al.  Biochemistry, mutagenesis, and oligomerization of DsRed, a red fluorescent protein from coral. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[75]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[76]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[77]  F. Arnold,et al.  Designed evolution of enzymatic properties. , 2000, Current opinion in biotechnology.

[78]  Christopher A. Voigt,et al.  Computationally optimizing the directed evolution of proteins , 2003 .

[79]  J. Petrosino,et al.  Amino acid sequence determinants of beta-lactamase structure and activity. , 1996, Journal of molecular biology.

[80]  Jay H. Lee,et al.  Support vector machines for learning to identify the critical positions of a protein. , 2005, Journal of theoretical biology.

[81]  N. Guex,et al.  SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modeling , 1997, Electrophoresis.

[82]  R. Siegel,et al.  Generation of large libraries of random mutants in Bacillus subtilis by PCR-based plasmid multimerization. , 1997, BioTechniques.

[83]  Konstantin A Lukyanov,et al.  Common pathway for the red chromophore formation in fluorescent proteins and chromoproteins. , 2004, Chemistry & biology.

[84]  Perturbation to enhance support vector machines for classification , 2004 .

[85]  A. Arkin,et al.  Optimizing Nucleotide Mixtures to Encode Specific Subsets of Amino Acids for Semi-Random Mutagenesis , 1992, Bio/Technology.

[86]  P. Negulescu,et al.  Intracellular detection assays for high-throughput screening. , 1998, Current opinion in biotechnology.

[87]  Jon E. Ness,et al.  Synthetic shuffling expands functional protein diversity by allowing amino acids to recombine independently , 2002, Nature Biotechnology.

[88]  F. Arnold,et al.  Combinatorial protein design: strategies for screening protein libraries. , 1997, Current opinion in structural biology.

[89]  Yasuhiko Shibanaka,et al.  Surveying a local fitness landscape of a protein with epistatic sites for the study of directed evolution. , 2002, Biopolymers.

[90]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[91]  N. Littlestone Learning Abound: Quickly When Irrelevant Attributes A New Linear-threshold Algorithm , 1988 .

[92]  J. H. Shim,et al.  Combinatorial protein engineering by incremental truncation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[93]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[94]  Stuart A. Kauffman,et al.  ORIGINS OF ORDER , 2019, Origins of Order.

[95]  Paul A. Bates,et al.  Domain Fishing: a first step in protein comparative modelling , 2002, Bioinform..

[96]  Wanzhi Huang,et al.  Cephalosporin Substrate Specificity Determinants of TEM-1 β-Lactamase* , 1997, Journal of Biological Chemistry.

[97]  Frances H. Arnold,et al.  Computational method to reduce the search space for directed protein evolution , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[98]  Modesto Castrillón,et al.  Face recognition using independent component analysis and support vector machines , 2003 .

[99]  J. Edwards,et al.  Robustness Analysis of the Escherichiacoli Metabolic Network , 2000, Biotechnology progress.

[100]  Tsuneo Yamane,et al.  Chimeric Gene Library Construction by a Simple and Highly Versatile Method Using Recombination‐Dependent Exponential Amplification , 2003, Biotechnology progress.

[101]  T C Terwilliger,et al.  In vivo characterization of mutants of the bacteriophage f1 gene V protein isolated by saturation mutagenesis. , 1994, Journal of molecular biology.

[102]  Frances H Arnold,et al.  Library analysis of SCHEMA‐guided protein recombination , 2003, Protein science : a publication of the Protein Society.

[103]  Frances H. Arnold,et al.  Directed evolution: Creating biocatalysts for the future , 1996 .

[104]  M. Karplus,et al.  Evaluation of comparative protein modeling by MODELLER , 1995, Proteins.

[105]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[106]  Christopher A. Voigt,et al.  De novo design of biocatalysts. , 2002, Current opinion in chemical biology.

[107]  S. Henikoff,et al.  Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. , 1998, Nucleic acids research.

[108]  P. Higgs,et al.  The accumulation of mutations in asexual populations and the structure of genealogical trees in the presence of selection , 1995 .

[109]  V. Cornish,et al.  Screening and Selection Methods for Large‐Scale Analysis of Protein Function , 2003 .

[110]  A. Soyster,et al.  An approach to guided learning of boolean functions , 1996 .

[111]  Richard Fox,et al.  Directed molecular evolution by machine learning and the influence of nonlinear interactions. , 2005, Journal of theoretical biology.

[112]  D. Hoover,et al.  DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. , 2002, Nucleic acids research.

[113]  Kezhi Mao,et al.  Feature subset selection for support vector machines through discriminative function pruning analysis , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[114]  C. Lunneborg Data Analysis by Resampling: Concepts and Applications , 1999 .

[115]  Robert E. Schapire,et al.  Design and analysis of efficient learning algorithms , 1992, ACM Doctoral dissertation award ; 1991.

[116]  D. Meldrum,et al.  Automation for genomics, part two: sequencers, microarrays, and future trends. , 2000, Genome research.

[117]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[118]  Phil Husbands,et al.  Fitness Landscapes and Evolvability , 2002, Evolutionary Computation.

[119]  L L Looger,et al.  Computational design of a Zn2+ receptor that controls bacterial gene expression , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[120]  Christian M. Reidys,et al.  Neutrality in fitness landscapes , 2001, Appl. Math. Comput..

[121]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[122]  Philip T. Pienkos,et al.  DNA shuffling method for generating highly recombined genes and evolved enzymes , 2001, Nature Biotechnology.

[123]  C. Largeron-Leténo,et al.  Prediction suffix trees for supervised classification of sequences , 2003 .

[124]  Rob J. Kulathinal,et al.  Compensated Deleterious Mutations in Insect Genomes , 2004, Science.

[125]  J. I The Design of Experiments , 1936, Nature.

[126]  K K Baldridge,et al.  The structure of the chromophore within DsRed, a red fluorescent protein from coral. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[127]  Leslie G. Valiant Projection learning , 1998, COLT' 98.

[128]  Winfrid G. Schneeweiss,et al.  Boolean functions - with engineering applications and computer programs , 1989 .

[129]  D. Pompon,et al.  High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2. , 2000, Nucleic acids research.

[130]  F. Arnold,et al.  Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[131]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[132]  David Haussler,et al.  Equivalence of models for polynomial learnability , 1988, COLT '88.

[133]  M. Zaccolo,et al.  The effect of high-frequency random mutagenesis on in vitro protein evolution: a study on TEM-1 beta-lactamase. , 1999, Journal of molecular biology.

[134]  J. Swartz,et al.  Advances in Escherichia coli production of therapeutic proteins. , 2001, Current opinion in biotechnology.

[135]  Narendra Maheshri,et al.  Computational and experimental analysis of DNA shuffling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[136]  S. Kauffman,et al.  Towards a general theory of adaptive walks on rugged landscapes. , 1987, Journal of theoretical biology.

[137]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[138]  Dana Angluin,et al.  Learning from noisy examples , 1988, Machine Learning.

[139]  B. Palsson,et al.  Regulation of gene expression in flux balance models of metabolism. , 2001, Journal of theoretical biology.

[140]  Frances H. Arnold,et al.  Design by Directed Evolution. , 1998 .

[141]  Charles S. Craik,et al.  Protein engineering : principles and practice , 1996 .

[142]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[143]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[144]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[145]  M. Zimmer,et al.  Structural analysis of the immature form of the GFP homologue DsRed. , 2003, Bioorganic & medicinal chemistry letters.

[146]  S J Remington,et al.  Refined crystal structure of DsRed, a red fluorescent protein from coral, at 2.0-A resolution. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[147]  Hyunsoo Kim,et al.  Protein secondary structure prediction based on an improved support vector machines approach. , 2003, Protein engineering.

[148]  W. Stemmer DNA shuffling by random fragmentation and reassembly: in vitro recombination for molecular evolution. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[149]  D. Botstein,et al.  Identification of amino acid substitutions that alter the substrate specificity of TEM-1 beta-lactamase , 1992, Journal of bacteriology.

[150]  C. Lutz,et al.  Unwanted mutations in PCR mutagenesis: avoiding the predictable. , 1993, PCR methods and applications.

[151]  Bernhard O. Palsson,et al.  Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions , 2000, BMC Bioinformatics.