Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences

Bioluminescent proteins (BLPs) are a class of proteins with various mechanisms of light emission such as bioluminescence and fluorescence from luminous organisms. While valuable for commercial and medical applications, identification of BLPs, including luciferases and fluorescent proteins (FPs), is rather challenging, owing to their high variety of protein sequences. Moreover, characterization of BLPs facilitates mutagenesis analysis to enhance bioluminescence and fluorescence. Therefore, this study proposes a novel methodological approach to estimating the propensity scores of 400 dipeptides and 20 amino acids in order to design two prediction methods and characterize BLPs based on a scoring card method (SCM). The SCMBLP method for predicting BLPs achieves an accuracy of 90.83% for 10-fold cross-validation higher than existing support vector machine based methods and a test accuracy of 82.85%. A dataset consisting of 269 luciferases and 216 FPs is also established to design the SCMLFP prediction method, which achieves training and test accuracies of 97.10% and 96.28%, respectively. Additionally, four informative physicochemical properties of 20 amino acids are identified using the estimated propensity scores to characterize BLPs as follows: 1) high transfer free energy from inside to the protein surface, 2) high occurrence frequency of residues in the transmembrane regions of the protein, 3) large hydrophobicity scale from the native protein structure, and 4) high correlation coefficient (R = 0.921) between the amino acid compositions of BLPs and integral membrane proteins. Further analyzing BLPs reveals that luciferases have a larger value of R (0.937) than FPs (0.635), suggesting that luciferases tend to locate near the cell membrane location rather than FPs for convenient receipt of extracellular ions. Importantly, the propensity scores of dipeptides and amino acids and the identified properties facilitate efforts to predict, characterize, and apply BLPs, including luciferases, photoproteins, and FPs. The web server is available at http://iclab.life.nctu.edu.tw/SCMBLP/index.html.

[1]  A. Campbell,et al.  Extraction, partial purification and properties of obelin, the calcium-activated luminescent protein from the hydroid Obelia geniculata. , 1974, The Biochemical journal.

[2]  Shiow-Fen Hwang,et al.  ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization , 2008, BMC Bioinformatics.

[3]  Shinn-Ying Ho,et al.  Intelligent evolutionary algorithms for large parameter optimization problems , 2004, IEEE Trans. Evol. Comput..

[4]  B. Kalthof,et al.  Pharmacological and Kinetic Characterization of Adrenomedullin 1 and Calcitonin Gene-Related Peptide 1 Receptor Reporter Cell Lines , 2008, Molecular Pharmacology.

[5]  Shinn-Ying Ho,et al.  Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties , 2011, BMC Bioinformatics.

[6]  Pierre-Alexandre Vidi,et al.  Fluorescent and Bioluminescent Protein-Fragment Complementation Assays in the Study of G Protein-Coupled Receptor Oligomerization and Signaling , 2009, Molecular Pharmacology.

[7]  R. Eglen,et al.  Photoproteins: important new tools in drug discovery. , 2008, Assay and drug development technologies.

[8]  L. Wrabetz,et al.  A Photoprotein in Mouse Embryonic Stem Cells Measures Ca2+ Mobilization in Cells and in Animals , 2010, PloS one.

[9]  Yingang Feng,et al.  Protein-protein complexation in bioluminescence , 2011, Protein & Cell.

[10]  R Y Tsien,et al.  Understanding, improving and using green fluorescent proteins. , 1995, Trends in biochemical sciences.

[11]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[12]  J. Senard,et al.  Probing heterotrimeric G protein activation: applications to biased ligands. , 2012, Current pharmaceutical design.

[13]  Osamu Shimomura,et al.  The crystal structure of the photoprotein aequorin at 2.3 Å resolution , 2000, Nature.

[14]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[15]  M. Degli Esposti,et al.  A critical evaluation of the hydropathy profile of membrane proteins. , 1990, European journal of biochemistry.

[16]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[17]  Thérèse Wilson,et al.  COMMENTS ON THE MECHANISMS OF CHEMI‐ AND BIOLUMINESCENCE , 1995 .

[18]  J. Janin,et al.  Surface and inside volumes in globular proteins , 1979, Nature.

[19]  P. Pinton,et al.  Subcellular calcium measurements in mammalian cells using jellyfish photoprotein aequorin-based probes , 2013, Nature Protocols.

[20]  Yanxin Huang,et al.  Prediction of Bioluminescent Proteins Using Auto Covariance Transformation of Evolutional Profiles , 2012, International journal of molecular sciences.

[21]  M J Sippl,et al.  Structure-derived hydrophobic potential. Hydrophobic potential derived from X-ray structures of globular proteins is able to identify native folds. , 1992, Journal of molecular biology.

[22]  Shinn-Ying Ho,et al.  Intelligent evolutionary algorithms for large parameter optimization problems , 2004, IEEE Transactions on Evolutionary Computation.

[23]  Wen-Liang Chen,et al.  Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition , 2012, BMC Bioinformatics.

[24]  Shinn-Ying Ho,et al.  POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties , 2007, Bioinform..

[25]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[26]  Emil H. White,et al.  The chemi- and bioluminescence of firefly luciferin: An efficient chemical production of electronically excited states , 1971 .

[27]  Qian-Zhong Li,et al.  Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[28]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[29]  H. Nakashima,et al.  Favorable and unfavorable amino acid residues in water-soluble and transmembrane proteins , 2013 .

[30]  Thomas Martinetz,et al.  BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection , 2011, BMC Bioinformatics.

[31]  Shinn-Ying Ho,et al.  SCMCRYS: Predicting Protein Crystallization Using an Ensemble Scoring Card Method with Estimating Propensity Scores of P-Collocated Amino Acid Pairs , 2013, PloS one.

[32]  Yuichiro Hori,et al.  [Crystal structure of the Aequorea victoria green fluorescent protein]. , 2007, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.