Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition.

Bioluminescent proteins are highly sensitive optical reporters for imaging in live animals; they have been extensively used in analytical applications in intracellular monitoring, genetic regulation and detection, and immune and binding assays. In this work, we systematically analyzed the sequence and structure information of 199 bioluminescent and nonbioluminescent proteins, respectively. Based on the results, we presented a novel method called auto covariance of averaged chemical shift (acACS) for extracting structure features from a sequence. A classifier of support vector machine (SVM) fusing increment of diversity (ID) was used to distinguish bioluminescent proteins from nonbioluminescent proteins by combining dipeptide composition, reduced amino acid composition, evolutionary information, and acACS. The overall prediction accuracy evaluated by jackknife validation reached 82.16%. This result was better than that obtained by other existing methods. Improvement of the overall prediction accuracy reached up to 5.33% higher than those of the SVM and auto covariance of sequential evolution information by 10-fold cross-validation. The acACS algorithm also outperformed other feature extraction methods, indicating that our approach is better than other existing methods in the literature.

[1]  V. Krishnan,et al.  An empirical correlation between secondary structure content and averaged chemical shifts in proteins. , 2003, Biophysical journal.

[2]  Qian-zhong Li,et al.  Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition , 2011, Amino Acids.

[3]  Yongchun Zuo,et al.  Predicting acidic and alkaline enzymes by incorporating the average chemical shift and gene ontology informations into the general form of Chou's PseAAC , 2013 .

[4]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[5]  Wei Chen,et al.  Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. , 2012, Journal of proteomics.

[6]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[7]  S.-W. Zhang,et al.  Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition , 2007, Amino Acids.

[8]  Xiaoqi Zheng,et al.  Predicting protein subcellular localization by pseudo amino acid composition with a segment-weighted and features-combined approach. , 2011, Protein and peptide letters.

[9]  J. W. Hastings,et al.  Chemistries and colors of bioluminescent reactions: a review. , 1996, Gene.

[10]  Loris Nanni,et al.  Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization , 2008, Amino Acids.

[11]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[12]  H. Mohabatkar,et al.  Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach , 2011, Journal of Structural and Functional Genomics.

[13]  Bohdan Schneider,et al.  A short survey on protein blocks , 2010, Biophysical Reviews.

[14]  Hassan Mohabatkar,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012, Medicinal chemistry (Shariqah (United Arab Emirates)).

[15]  Dinesh Gupta,et al.  Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[17]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[18]  A. Bax,et al.  Empirical correlation between protein backbone conformation and C.alpha. and C.beta. 13C nuclear magnetic resonance chemical shifts , 1991 .

[19]  Jia He,et al.  Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. , 2010, Analytical biochemistry.

[20]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[21]  Kurt Wüthrich,et al.  Statistical Basis for the Use of13CαChemical Shifts in Protein Structure Determination , 1995 .

[22]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[23]  J. W. Hastings Biological diversity, chemical mechanisms, and the evolutionary origins of bioluminescent systems , 2005, Journal of Molecular Evolution.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[26]  A. G. Brevern,et al.  A reduced amino acid alphabet for understanding and designing protein adaptation to mutation , 2007, European Biophysics Journal.

[27]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[28]  Alexandre G. de Brevern,et al.  New assessment of a structural alphabet , 2005, Silico Biol..

[29]  A. Roda,et al.  Nanobioanalytical luminescence: Förster-type energy transfer methods , 2009, Analytical and bioanalytical chemistry.

[30]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[31]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[32]  Kuo-Chen Chou,et al.  Hepatitis C Virus Network Based Classification of Hepatocellular Cirrhosis and Carcinoma , 2012, PloS one.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Minoru Kanehisa,et al.  Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs , 2003, Bioinform..

[35]  Kuo-Chen Chou,et al.  Support Vector Machine for predicting α-turn types , 2003, Peptides.

[36]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[37]  Cassius Vinicius Stevani,et al.  Firefly Luminescence: a Historical Perspective and Recent Developments the Structural Origin and Biological Function of Ph-sensitivity in Firefly Luciferases Activity Coupling and Complex Formation between Bacterial Luciferase and Flavin Reductases Coelenterazine-binding Protein of Renilla Muelleri: , 2022 .

[38]  Kuo-Chen Chou,et al.  Identify catalytic triads of serine hydrolases by support vector machines. , 2004, Journal of theoretical biology.

[39]  Sylvia Daunert,et al.  Bioluminescence and its impact on bioanalysis. , 2011, Annual review of analytical chemistry.

[40]  Qian-zhong Li,et al.  Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou's pseudo amino acid composition. , 2012, Journal of theoretical biology.

[41]  Shimshon Belkin,et al.  Modeling and measurement of a whole-cell bioluminescent biosensor based on a single photon avalanche diode. , 2008, Biosensors & bioelectronics.

[42]  Yanxin Huang,et al.  Prediction of Bioluminescent Proteins Using Auto Covariance Transformation of Evolutional Profiles , 2012, International journal of molecular sciences.

[43]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[44]  Qingming Luo,et al.  Microfluidic chip toward cellular ATP and ATP-conjugated metabolic analysis with bioluminescence detection. , 2005, Analytical chemistry.

[45]  Thomas Meitinger,et al.  MITOP, the mitochondrial proteome database: 2000 update , 2000, Nucleic Acids Res..

[46]  Mandana Behbahani,et al.  Predicting antibacterial peptides by the concept of Chou's pseudo-amino acid composition and machine learning methods. , 2012, Protein and peptide letters.

[47]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[48]  Jean-Philippe Vert,et al.  A novel representation of protein sequences for prediction of subcellular location using support vector machines , 2005, Protein science : a publication of the Protein Society.

[49]  Zhi-Ping Feng,et al.  An overview on predicting the subcellular location of a protein , 2002, Silico Biol..

[50]  J. Christodoulides,et al.  The use of DNA molecular beacons as nanoscale temperature probes for microchip-based biosensors. , 2008, Biosensors & bioelectronics.

[51]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[52]  Q. Z. Li,et al.  The prediction of the structural class of protein: application of the measure of diversity. , 2001, Journal of theoretical biology.

[53]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[54]  Thomas Martinetz,et al.  BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection , 2011, BMC Bioinformatics.

[55]  V. V. Krishnan,et al.  Protein structural class identification directly from NMR spectra using averaged chemical shifts , 2003, Bioinform..

[56]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[57]  Jaap Heringa,et al.  Protein secondary structure prediction. , 2010, Methods in molecular biology.

[58]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[59]  Shuai Cheng Li,et al.  Protein Secondary Structure Prediction Using NMR Chemical Shift Data , 2010, J. Bioinform. Comput. Biol..

[60]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[61]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[62]  Qian-zhong Li,et al.  Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids , 2010, Amino Acids.

[63]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[64]  Kuo-Chen Chou,et al.  Support vector machines for prediction of protein signal sequences and their cleavage sites , 2003, Peptides.

[65]  K. Chou,et al.  Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network , 2012, PloS one.

[66]  Wei Wang,et al.  Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids , 2007, Science in China Series C: Life Sciences.

[67]  P. Krogsgaard‐Larsen,et al.  Detecting Protein–Protein Interactions in Living Cells: Development of a Bioluminescence Resonance Energy Transfer Assay to Evaluate the PSD-95/NMDA Receptor Interaction , 2009, Neurochemical Research.

[68]  F. Richards,et al.  Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. , 1991, Journal of molecular biology.

[69]  Alessandro Vullo,et al.  Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information , 2007, BMC Bioinformatics.

[70]  Kuo-Chen Chou,et al.  Signal Propagation in Protein Interaction Network during Colorectal Cancer Progression , 2013, BioMed research international.

[71]  Wei Chen,et al.  iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties , 2012, PloS one.

[72]  E. Widder,et al.  Bioluminescence in the Ocean: Origins of Biological, Chemical, and Ecological Diversity , 2010, Science.