High-Throughput Identification of Mammalian Secreted Proteins Using Species-Specific Scheme and Application to Human Proteome

Secreted proteins are widely spread in living organisms and cells. Since secreted proteins are easy to be detected in body fluids, urine, and saliva in clinical diagnosis, they play important roles in biomarkers for disease diagnosis and vaccine production. In this study, we propose a novel predictor for accurate high-throughput identification of mammalian secreted proteins that is based on sequence-derived features. We combine the features of amino acid composition, sequence motifs, and physicochemical properties to encode collected proteins. Detailed feature analyses prove the effectiveness of the considered features. Based on the differences across various species of secreted proteins, we introduce the species-specific scheme, which is expected to further explore the intrinsic attributes of specific secreted proteins. Experiments on benchmark datasets prove the effectiveness of our proposed method. The test on independent testing dataset also promises a good generalization capability. When compared with the traditional universal model, we experimentally demonstrate that the species-specific scheme is capable of significantly improving the prediction performance. We use our method to make predictions on unreviewed human proteome, and find 272 potential secreted proteins with probabilities that are higher than 99%. A user-friendly web server, named iMSPs (identification of Mammalian Secreted Proteins), which implements our proposed method, is designed and is available for free for academic use at: http://www.inforstation.com/webservers/iMSP/.

[1]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[2]  Shinn-Ying Ho,et al.  Prediction of non-classical secreted proteins using informative physicochemical properties , 2010, Interdisciplinary Sciences: Computational Life Sciences.

[3]  B. Dobberstein,et al.  Post‑Targeting Functions of Signal Peptides , 2013 .

[4]  Qiang Cheng,et al.  The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ying Xu,et al.  In-silico prediction of blood-secretory human proteins using a ranking algorithm , 2010, BMC Bioinformatics.

[6]  G. Skopp,et al.  Partition coefficient, blood to plasma ratio, protein binding and short-term stability of 11-nor-Delta(9)-carboxy tetrahydrocannabinol glucuronide. , 2002, Forensic science international.

[7]  J. Scott,et al.  A-kinase anchoring proteins: protein kinase A and beyond. , 2000, Current opinion in cell biology.

[8]  Clara Abraham,et al.  Interactions between the host innate immune system and microbes in inflammatory bowel disease. , 2011, Gastroenterology.

[9]  Menglong Li,et al.  Functional classification of secreted proteins by position specific scoring matrix and auto covariance , 2012 .

[10]  Bo Gao,et al.  Identification of DNA-binding proteins using multi-features fusion and binary firefly optimization algorithm , 2016, BMC Bioinformatics.

[11]  Gang Wu,et al.  Correlation between mRNA and protein abundance in Desulfovibrio vulgaris: a multiple regression to identify sources of variations. , 2006, Biochemical and biophysical research communications.

[12]  M. Fukata,et al.  Leucine‐rich glioma inactivated 1 (Lgi1), an epilepsy‐related secreted protein, has a nuclear localization signal and localizes to both the cytoplasm and the nucleus of the caudal ganglionic eminence neurons , 2012, The European journal of neuroscience.

[13]  Lukasz Kurgan,et al.  Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains , 2019, Briefings Bioinform..

[14]  Yanchun Liang,et al.  Computational Prediction of Human Salivary Proteins from Blood Circulation and Application to Diagnostic Biomarker Identification , 2013, PloS one.

[15]  Milan Randic,et al.  Novel Shape Descriptors for Molecular Graphs , 2001, J. Chem. Inf. Comput. Sci..

[16]  Daniel Restrepo-Montoya,et al.  NClassG+: A classifier for non-classically secreted Gram-positive bacterial proteins , 2011, BMC Bioinformatics.

[17]  B. Kerwin Polysorbates 20 and 80 used in the formulation of protein biotherapeutics: structure and degradation pathways. , 2008, Journal of pharmaceutical sciences.

[18]  Menglong Li,et al.  In silico identification of Gram-negative bacterial secreted proteins from primary sequence , 2013, Comput. Biol. Medicine.

[19]  W. Eaton,et al.  Probing the free-energy surface for protein folding with single-molecule fluorescence spectroscopy , 2002, Nature.

[20]  Lukasz A. Kurgan,et al.  Review and comparative assessment of sequence‐based predictors of protein‐binding residues , 2018, Briefings Bioinform..

[21]  N. Blom,et al.  Feature-based prediction of non-classical and leaderless protein secretion. , 2004, Protein engineering, design & selection : PEDS.

[22]  C. Drevon,et al.  Secreted proteins from adipose tissue and skeletal muscle – adipokines, myokines and adipose/muscle cross-talk , 2011, Archives of physiology and biochemistry.

[23]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[24]  Celine S. Hong,et al.  A Computational Method for Prediction of Excretory Proteins and Application to Identification of Gastric Cancer Markers in Urine , 2011, PloS one.

[25]  John G. Collard,et al.  Crosstalk between small GTPases and polarity proteins in cell polarization , 2008, Nature Reviews Molecular Cell Biology.

[26]  B. Kobe,et al.  The leucine-rich repeat as a protein recognition motif. , 2001, Current opinion in structural biology.

[27]  Haiting Chai,et al.  Identification of Mammalian Enzymatic Proteins Based on Sequence-Derived Features and Species-Specific Scheme , 2018, IEEE Access.

[28]  M. Vainstein,et al.  Metarhizium anisopliae enzymes and toxins. , 2010, Toxicon : official journal of the International Society on Toxinology.

[29]  L. Luo,et al.  Role of leucine-rich repeat proteins in the development and function of neural circuits. , 2011, Annual review of cell and developmental biology.

[30]  Erik L. L. Sonnhammer,et al.  Advantages of combined transmembrane topology and signal peptide prediction—the Phobius web server , 2007, Nucleic Acids Res..

[31]  Yong-Zi Chen,et al.  Prediction of Ubiquitination Sites by Using the Composition of k-Spaced Amino Acid Pairs , 2011, PloS one.

[32]  M. Kuehn,et al.  Specificity of the Type II Secretion Systems of Enterotoxigenic Escherichia coli and Vibrio cholerae for Heat-Labile Enterotoxin and Cholera Toxin , 2010, Journal of bacteriology.

[33]  Jeff A. Bilmes,et al.  Transmembrane Topology and Signal Peptide Prediction Using Dynamic Bayesian Networks , 2008, PLoS Comput. Biol..

[34]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[35]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[36]  Henrik Nielsen,et al.  Predicting Secretory Proteins with SignalP. , 2017, Methods in molecular biology.

[37]  Gajendra P. S. Raghava,et al.  A Machine Learning Based Method for the Prediction of Secretory Proteins Using Amino Acid Composition, Their Order and Similarity-Search , 2008, Silico Biol..

[38]  Pinak Chakrabarti,et al.  Quantifying the accessible surface area of protein residues in their local environment. , 2002, Protein engineering.

[39]  W. Nickel The mystery of nonclassical protein secretion. A current view on cargo proteins and potential export routes. , 2003, European journal of biochemistry.

[40]  Ying Sun,et al.  A Computational Method for Prediction of Saliva-Secretory Proteins and Its Application to Identification of Head and Neck Cancer Biomarkers for Salivary Diagnosis , 2015, IEEE Transactions on Nanobioscience.

[41]  W. Sherman,et al.  Prediction of Absolute Solvation Free Energies using Molecular Dynamics Free Energy Perturbation and the OPLS Force Field. , 2010, Journal of chemical theory and computation.

[42]  M. Kuehn,et al.  Biological functions and biogenesis of secreted bacterial outer membrane vesicles. , 2010, Annual review of microbiology.