Crysalis: an integrated server for computational analysis and design of protein crystallization

The failure of multi-step experimental procedures to yield diffraction-quality crystals is a major bottleneck in protein structure determination. Accordingly, several bioinformatics methods have been successfully developed and employed to select crystallizable proteins. Unfortunately, the majority of existing in silico methods only allow the prediction of crystallization propensity, seldom enabling computational design of protein mutants that can be targeted for enhancing protein crystallizability. Here, we present Crysalis, an integrated crystallization analysis tool that builds on support-vector regression (SVR) models to facilitate computational protein crystallization prediction, analysis, and design. More specifically, the functionality of this new tool includes: (1) rapid selection of target crystallizable proteins at the proteome level, (2) identification of site non-optimality for protein crystallization and systematic analysis of all potential single-point mutations that might enhance protein crystallization propensity, and (3) annotation of target protein based on predicted structural properties. We applied the design mode of Crysalis to identify site non-optimality for protein crystallization on a proteome-scale, focusing on proteins currently classified as non-crystallizable. Our results revealed that site non-optimality is based on biases related to residues, predicted structures, physicochemical properties, and sequence loci, which provides in-depth understanding of the features influencing protein crystallization. Crysalis is freely available at http://nmrcen.xmu.edu.cn/crysalis/.

[1]  Geoffrey J Barton,et al.  XANNpred: Neural nets that predict the propensity of a protein to yield diffraction-quality crystals , 2010, Proteins.

[2]  Andrzej Joachimiak,et al.  Predicting protein crystallization propensity from protein sequence , 2010, Journal of Structural and Functional Genomics.

[3]  B. Rost,et al.  Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data , 2009, Nature Biotechnology.

[4]  Leszek Rychlewski,et al.  XtalPred: a web server for prediction of protein crystallizability , 2007, Bioinform..

[5]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[6]  D. Hoover,et al.  DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. , 2002, Nucleic acids research.

[7]  Xiaowei Zhao,et al.  Prediction of Protein Phosphorylation Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2012, PloS one.

[8]  Z. Derewenda,et al.  Protein crystallization by surface entropy reduction: optimization of the SER strategy. , 2007, Acta crystallographica. Section D, Biological crystallography.

[9]  David Eisenberg,et al.  Toward rational protein crystallization: A Web server for the design of crystallizable protein variants , 2007, Protein science : a publication of the Protein Society.

[10]  Scott Dick,et al.  CRYSTALP2: sequence-based protein crystallization propensity prediction , 2009, BMC Structural Biology.

[11]  T. A. Brown Gene Cloning and DNA Analysis: An Introduction , 2001 .

[12]  Jiangning Song,et al.  hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties. , 2013, Biochimica et biophysica acta.

[13]  Jiangning Song,et al.  PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection , 2014, PloS one.

[14]  Mark A. Girolami,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btn055 Sequence analysis ParCrys: a Parzen window density estimation approach , 2022 .

[15]  Arne Elofsson,et al.  Rapid membrane protein topology prediction , 2011, Bioinform..

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[18]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[19]  G. Church,et al.  Accurate multiplex gene synthesis from programmable DNA microchips , 2004, Nature.

[20]  Ganesan Pugalenthi,et al.  SVMCRYS: an SVM approach for the prediction of protein crystallization propensity from protein sequence. , 2010, Protein and peptide letters.

[21]  R. Berisio,et al.  Enhanced crystallizability by protein engineering approaches: a general overview. , 2012, Protein and peptide letters.

[22]  Geoffrey J Barton,et al.  A normalised scale for structural genomics target ranking: The OB‐Score , 2006, FEBS letters.

[23]  Shinn-Ying Ho,et al.  SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs , 2013, PloS one.

[24]  Zygmunt S Derewenda,et al.  The use of recombinant methods and molecular engineering in protein crystallization. , 2004, Methods.

[25]  Samad Jahandideh,et al.  RFCRYS: sequence-based protein crystallization propensity prediction by means of random forest. , 2012, Journal of theoretical biology.

[26]  Ke Chen,et al.  Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs , 2007, BMC Structural Biology.

[27]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[28]  Geoffrey I. Webb,et al.  Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features , 2014, Scientific Reports.

[29]  Daniel W. A. Buchan,et al.  Scalable web services for the PSIPRED Protein Analysis Workbench , 2013, Nucleic Acids Res..

[30]  Xing-Ming Zhao,et al.  Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets , 2014, Bioinform..

[31]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[32]  Dmitrij Frishman,et al.  Will my protein crystallize? A sequence‐based predictor , 2005, Proteins.

[33]  Geoffrey J. Barton,et al.  Computational approaches to selecting and optimising targets for structural biology , 2011, Methods.

[34]  Andrzej Joachimiak,et al.  High-throughput crystallography for structural genomics. , 2009, Current opinion in structural biology.

[35]  Philip E. Bourne,et al.  The RCSB PDB information portal for structural genomics , 2005, Nucleic Acids Res..

[36]  Lukasz Kurgan,et al.  Prediction of protein crystallization using collocation of amino acid pairs. , 2007, Biochemical and biophysical research communications.

[37]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[38]  Adam Godzik,et al.  Improving the chances of successful protein structure determination with a random forest classifier. , 2014, Acta crystallographica. Section D, Biological crystallography.

[39]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[40]  Ziding Zhang,et al.  Prediction of mucin-type O-glycosylation sites in mammalian proteins using the composition of k-spaced amino acid pairs , 2008, BMC Bioinformatics.

[41]  Lukasz A. Kurgan,et al.  Sequence-based prediction of protein crystallization, purification and production propensity , 2011, Bioinform..

[42]  Florencio Pazos,et al.  COPRED: prediction of fold, GO molecular function and functional residues at the domain level , 2013, Bioinform..

[43]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[44]  Marcin J Mizianty,et al.  CRYSpred: accurate sequence-based protein crystallization propensity prediction using sequence-derived structural characteristics. , 2012, Protein and peptide letters.