Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme

BackgroundBioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs.ResultsWe collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches.ConclusionExperiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineage-specific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.

[1]  S. Daunert,et al.  Engineering bioluminescent proteins: expanding their analytical potential. , 2009, Analytical chemistry.

[2]  N. Ferré,et al.  The chemistry of bioluminescence: an analysis of chemical functionalities. , 2011, Chemphyschem : a European journal of chemical physics and physical chemistry.

[3]  Y. Otani,et al.  Bioluminescent indicator for determining protein-protein interactions using intramolecular complementation of split click beetle luciferase. , 2007, Analytical chemistry.

[4]  R A Moats,et al.  Time course of bioluminescent signal in orthotopic and heterotopic brain tumors in nude mice. , 2003, BioTechniques.

[5]  C. Contag,et al.  Advances in in vivo bioluminescence imaging of gene expression. , 2002, Annual review of biomedical engineering.

[6]  Qiang Cheng,et al.  The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yanxin Huang,et al.  Prediction of Bioluminescent Proteins Using Auto Covariance Transformation of Evolutional Profiles , 2012, International journal of molecular sciences.

[8]  Lukasz A. Kurgan,et al.  Sequence-based prediction of protein crystallization, purification and production propensity , 2011, Bioinform..

[9]  Miodrag Lovric,et al.  International Encyclopedia of Statistical Science , 2011 .

[10]  John G. Collard,et al.  Crosstalk between small GTPases and polarity proteins in cell polarization , 2008, Nature Reviews Molecular Cell Biology.

[11]  D. Huppert,et al.  Comparative study of the photoprotolytic reactions of D-luciferin and oxyluciferin. , 2012, The journal of physical chemistry. A.

[12]  M. Vihinen,et al.  Accuracy of protein flexibility predictions , 1994, Proteins.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[15]  T. Hirano,et al.  Spectroscopic studies of the light-color modulation mechanism of firefly (beetle) bioluminescence. , 2009, Journal of the American Chemical Society.

[16]  H. Naderi-manesh,et al.  Effect of charge distribution in a flexible loop on the bioluminescence color of firefly luciferases. , 2009, Biochemistry.

[17]  Markus Neuhäuser,et al.  Wilcoxon Signed Rank Test , 2006 .

[18]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[19]  Sylvia Daunert,et al.  Bioluminescence and its impact on bioanalysis. , 2011, Annual review of analytical chemistry.

[20]  Richard A Friesner,et al.  Structure and dynamics of the solvation of bovine pancreatic trypsin inhibitor in explicit water: a comparative study of the effects of solvent and protein polarizability. , 2005, The journal of physical chemistry. B.

[21]  S. Hosseinkhani,et al.  The effective role of positive charge saturation in bioluminescence color and thermostability of firefly luciferase , 2009, Photochemical & photobiological sciences : Official journal of the European Photochemistry Association and the European Society for Photobiology.

[22]  Lukasz A. Kurgan,et al.  Prediction of integral membrane protein type by collocated hydrophobic amino acid pairs , 2009, J. Comput. Chem..

[23]  Pinak Chakrabarti,et al.  Quantifying the accessible surface area of protein residues in their local environment. , 2002, Protein engineering.

[24]  Chunhua Liu,et al.  A Multifeatures Fusion and Discrete Firefly Optimization Method for Prediction of Protein Tyrosine Sulfation Residues , 2016, BioMed research international.

[25]  U. Gohlke,et al.  Crystal structure of native and a mutant of Lampyris turkestanicus luciferase implicate in bioluminescence color shift. , 2013, Biochimica et biophysica acta.

[26]  S. Hosseinkhani Molecular enigma of multicolor bioluminescence of firefly luciferase , 2011, Cellular and Molecular Life Sciences.

[27]  Abhigyan Nath,et al.  Unsupervised learning assisted robust prediction of bioluminescent proteins , 2016, Comput. Biol. Medicine.

[28]  E. Widder,et al.  Bioluminescence in the Ocean: Origins of Biological, Chemical, and Ecological Diversity , 2010, Science.

[29]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[30]  Zhiyong Lu,et al.  Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases , 2011 .

[31]  Hui-Ling Huang,et al.  Propensity Scores for Prediction and Characterization of Bioluminescent Proteins from Sequences , 2014, PloS one.

[32]  M. Mirasoli,et al.  Analytical bioluminescence and chemiluminescence , 2003, Analytical and Bioanalytical Chemistry.

[33]  P. Ponnuswamy,et al.  Positional flexibilities of amino acid residues in globular proteins , 2009 .

[34]  M. Pagano,et al.  Student's t test. , 1993, Nutrition.

[35]  J. C. D. Silva,et al.  Computational Investigation of the Effect of pH on the Color of Firefly Bioluminescence by DFT , 2011 .

[36]  J. E. D. Esteves da Silva,et al.  Computational investigation of the effect of pH on the color of firefly bioluminescence by DFT. , 2011, Chemphyschem : a European journal of chemical physics and physical chemistry.

[37]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[38]  B. Applegate,et al.  Synergistic activity of hydrophilic modification in antibiotic polymers. , 2007, Biomacromolecules.

[39]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[40]  Lukasz Kurgan,et al.  High-throughput prediction of RNA, DNA and protein binding regions mediated by intrinsic disorder , 2015, Nucleic acids research.

[41]  D. W. Bolen,et al.  The peptide backbone plays a dominant role in protein stabilization by naturally occurring osmolytes. , 1995, Biochemistry.

[42]  N. Kudryasheva Bioluminescence and exogenous compounds: physico-chemical basis for bioluminescent assay. , 2006, Journal of photochemistry and photobiology. B, Biology.

[43]  Homogeneous, bioluminescent proteasome assays. , 2015, Methods in molecular biology.

[44]  Qian-Zhong Li,et al.  Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[45]  Thomas Martinetz,et al.  BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection , 2011, BMC Bioinformatics.

[46]  Y. B. Wah,et al.  Power comparisons of Shapiro-Wilk , Kolmogorov-Smirnov , Lilliefors and Anderson-Darling tests , 2011 .

[47]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[48]  Gary D Stormo,et al.  An Introduction to Sequence Similarity (“Homology”) Searching , 2009, Current protocols in bioinformatics.

[49]  Bo Gao,et al.  Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm , 2016, BMC Bioinformatics.

[50]  Detection of bioluminescence from individual bacterial cells: a comparison of two different low-light imaging systems. , 1997, Journal of bioluminescence and chemiluminescence.

[51]  A. Roda,et al.  Progress in chemical luminescence-based biosensors: A critical review. , 2016, Biosensors & bioelectronics.

[52]  Jing Hu BLKnn: A K-nearest neighbors method for predicting bioluminescent proteins , 2014, 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.

[53]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[54]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..