NeuroPID: a predictor for identifying neuropeptide precursors from metazoan proteomes

MOTIVATION The evolution of multicellular organisms is associated with increasing variability of molecules governing behavioral and physiological states. This is often achieved by neuropeptides (NPs) that are produced in neurons from a longer protein, named neuropeptide precursor (NPP). The maturation of NPs occurs through a sequence of proteolytic cleavages. The difficulty in identifying NPPs is a consequence of their diversity and the lack of applicable sequence similarity among the short functionally related NPs. RESULTS Herein, we describe Neuropeptide Precursor Identifier (NeuroPID), a machine learning scheme that predicts metazoan NPPs. NeuroPID was trained on hundreds of identified NPPs from the UniProtKB database. Some 600 features were extracted from the primary sequences and processed using support vector machines (SVM) and ensemble decision tree classifiers. These features combined biophysical, chemical and informational-statistical properties of NPs and NPPs. Other features were guided by the defining characteristics of the dibasic cleavage sites motif. NeuroPID reached 89-94% accuracy and 90-93% precision in cross-validation blind tests against known NPPs (with an emphasis on Chordata and Arthropoda). NeuroPID also identified NPP-like proteins from extensively studied model organisms as well as from poorly annotated proteomes. We then focused on the most significant sets of features that contribute to the success of the classifiers. We propose that NPPs are attractive targets for investigating and modulating behavior, metabolism and homeostasis and that a rich repertoire of NPs remains to be identified. AVAILABILITY NeuroPID source code is freely available at http://www.protonet.cs.huji.ac.il/neuropid

[1]  Rachael P. Huntley,et al.  The UniProt-GO Annotation database in 2011 , 2011, Nucleic Acids Res..

[2]  Vassilios Ioannidis,et al.  ExPASy: SIB bioinformatics resource portal , 2012, Nucleic Acids Res..

[3]  Marcus Svensson,et al.  Neuropeptidomics: MS applied to the discovery of novel peptides from the brain. , 2007, Analytical chemistry.

[4]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[5]  Susan J. Brown,et al.  Creating a buzz about insect genomes. , 2011, Science.

[6]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[7]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[8]  Thomas R Insel,et al.  Neuropeptides and the evolution of social behavior , 2000, Current Opinion in Neurobiology.

[9]  D. Nässel,et al.  Neuropeptide signaling in insects. , 2010, Advances in experimental medicine and biology.

[10]  Vivian Hook,et al.  Unique biological function of cathepsin L in secretory vesicles for biosynthesis of neuropeptides , 2010, Neuropeptides.

[11]  Michal Linial,et al.  Short Toxin-like Proteins Attack the Defense Line of Innate Immunity , 2013, Toxins.

[12]  Jing Zhang,et al.  RBF-SVM and its application on reliability evaluation of electric power system communication network , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[13]  Michal Linial,et al.  A predictor for toxin-like proteins exposes cell modulator candidates within viral genomes , 2010, Bioinform..

[14]  C. Orengo,et al.  Protein function annotation by homology-based inference , 2009, Genome Biology.

[15]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[16]  D. Nässel Neuropeptides in the nervous system of Drosophila and other insects: multiple roles as neuromodulators and neurohormones , 2002, Progress in Neurobiology.

[17]  Matthias W. Seeger,et al.  Gaussian Processes For Machine Learning , 2004, Int. J. Neural Syst..

[18]  Lloyd D. Fricker,et al.  Hemopressin and Other Bioactive Peptides from Cytosolic Proteins: Are These Non-Classical Neuropeptides? , 2010, The AAPS Journal.

[19]  Luis Diambra,et al.  Neuropeptide precursor gene discovery in the Chagas disease vector Rhodnius prolixus. , 2011 .

[20]  Mario Delgado,et al.  Regulation of immune tolerance by anti-inflammatory neuropeptides , 2007, Nature Reviews Immunology.

[21]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[22]  Shunyi Zhu,et al.  Comparative genomics analysis of five families of antimicrobial peptide-like genes in seven ant species. , 2012, Developmental and comparative immunology.

[23]  Bruce R. Southey,et al.  Prediction of neuropeptide cleavage sites in insects , 2008, Bioinform..

[24]  Geert Wets,et al.  Bioinformatic approaches to the identification of novel neuropeptide precursors. , 2010, Methods in molecular biology.

[25]  Si Wu,et al.  Improving support vector machine classifiers by modifying kernel functions , 1999, Neural Networks.

[26]  R D Appel,et al.  Protein identification and analysis tools in the ExPASy server. , 1999, Methods in molecular biology.

[27]  F. Liu,et al.  The construction of a bioactive peptide database in Metazoa. , 2008, Journal of proteome research.

[28]  S. Tobe,et al.  The role of allatostatins in juvenile hormone synthesis in insects and crustaceans. , 2007, Annual review of entomology.

[29]  M. Fälth,et al.  SwePep, a Database Designed for Endogenous Peptides and Mass Spectrometry* , 2006, Molecular & Cellular Proteomics.

[30]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[31]  Jonathan V Sweedler,et al.  Discovering new invertebrate neuropeptides using mass spectrometry. , 2006, Mass spectrometry reviews.

[32]  G. Jékely Global view of the evolution and diversity of metazoan neuropeptide signaling , 2013, Proceedings of the National Academy of Sciences.

[33]  P. Svenningsson,et al.  The significance of biochemical and molecular sample integrity in brain proteomics and peptidomics: Stathmin 2‐20 and peptides as sample quality indicators , 2007, Proteomics.

[34]  Olivier Gascuel,et al.  Identification of novel peptide hormones in the human proteome by hidden Markov model screening. , 2007, Genome research.

[35]  Liliane Schoofs,et al.  Comparative peptidomics of Caenorhabditis elegans versus C. briggsae by LC–MALDI-TOF MS , 2009, Peptides.

[36]  M. Altstein,et al.  Insect neuropeptide antagonists. , 2001, Biopolymers.

[37]  S. Brain,et al.  Neuropeptides and their receptors: innovative science providing novel therapeutic targets , 2006, British journal of pharmacology.

[38]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[39]  T. D. Schneider,et al.  70% efficiency of bistate molecular machines explained by information theory, high dimensional geometry and evolutionary convergence , 2010, Nucleic acids research.

[40]  J. Veenstra,et al.  Mono- and dibasic proteolytic cleavage sites in insect neuroendocrine peptide precursors. , 2000, Archives of insect biochemistry and physiology.

[41]  Gunnar Rätsch,et al.  Support Vector Machines and Kernels for Computational Biology , 2008, PLoS Comput. Biol..

[42]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[43]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[44]  R. Mentlein,et al.  Endopeptidases 24.16 and 24.15 Are Responsible for the Degradation of Somatostatin, Neurotensin, and Other Neuropeptides by Cultivated Rat Cortical Astrocytes , 1994, Journal of neurochemistry.

[45]  Michal Linial,et al.  When Less Is More: Improving Classification of Protein Families with a Minimal Set of Global Features , 2007, WABI.

[46]  Nuno Bandeira,et al.  NeuroPedia: neuropeptide database and spectral library , 2011, Bioinform..

[47]  Liliane Schoofs,et al.  Peptidomics in Drosophila melanogaster. , 2003, Briefings in functional genomics & proteomics.

[48]  William Stafford Noble,et al.  Computational and Statistical Analysis of Protein Mass Spectrometry Data , 2012, PLoS Comput. Biol..

[49]  D. Merkler,et al.  C-terminal amidated peptides: production by the in vitro enzymatic amidation of glycine-extended peptides and the importance of the amide to bioactivity. , 1994, Enzyme and microbial technology.

[50]  David T. Jones,et al.  pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination , 2009, Bioinform..

[51]  Bruce R. Southey,et al.  NeuroPred: a tool to predict cleavage sites in neuropeptide precursors and provide the masses of the resulting peptides , 2006, Nucleic Acids Res..

[52]  William Stafford Noble,et al.  Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure , 2006, Bioinform..

[53]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[54]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[55]  Liliane Schoofs,et al.  From the Genome to the Proteome: Uncovering Peptides in the Apis Brain , 2006, Science.

[56]  Michal Linial,et al.  Short Toxin-like Proteins Abound in Cnidaria Genomes , 2012, Toxins.

[57]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[58]  Robert J. Weaver,et al.  Analysis of peptides in the brain and corpora cardiaca–corpora allata of the honey bee, Apis mellifera using MALDI-TOF mass spectrometry , 2006, Peptides.