Enhanced Missing Proteins Detection in NCI60 Cell Lines Using an Integrative Search Engine Approach

The Human Proteome Project (HPP) aims deciphering the complete map of the human proteome. In the past few years, significant efforts of the HPP teams have been dedicated to the experimental detection of the missing proteins, which lack reliable mass spectrometry evidence of their existence. In this endeavor, an in depth analysis of shotgun experiments might represent a valuable resource to select a biological matrix in design validation experiments. In this work, we used all the proteomic experiments from the NCI60 cell lines and applied an integrative approach based on the results obtained from Comet, Mascot, OMSSA, and X!Tandem. This workflow benefits from the complementarity of these search engines to increase the proteome coverage. Five missing proteins C-HPP guidelines compliant were identified, although further validation is needed. Moreover, 165 missing proteins were detected with only one unique peptide, and their functional analysis supported their participation in cellular pathways as was also proposed in other studies. Finally, we performed a combined analysis of the gene expression levels and the proteomic identifications from the common cell lines between the NCI60 and the CCLE project to suggest alternatives for further validation of missing protein observations.

[1]  Mathieu Schaeffer,et al.  The neXtProt peptide uniqueness checker: a tool for the proteomics community , 2017, Bioinform..

[2]  S. Ranganathan,et al.  Accelerating the search for the missing proteins in the human proteome , 2017, Nature Communications.

[3]  R. Ge,et al.  The susceptibility gene screening in a Chinese high-altitude pulmonary edema family by whole-exome sequencing. , 2017, Yi chuan = Hereditas.

[4]  Martin Eisenacher,et al.  In-depth analysis of protein inference algorithms using multiple search engines and well-defined metrics. , 2017, Journal of proteomics.

[5]  F. Corrales,et al.  Progress and pitfalls in finding the ‘missing proteins’ from the human proteome map , 2017, Expert review of proteomics.

[6]  Amos Bairoch,et al.  The neXtProt knowledgebase on human proteins: 2017 update , 2016, Nucleic Acids Res..

[7]  Ruedi Aebersold,et al.  Highlights of the Biology and Disease-driven Human Proteome Project, 2015-2016. , 2016, Journal of proteome research.

[8]  Luis Mendoza,et al.  Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics. , 2016, Journal of proteome research.

[9]  J. Vizcaíno,et al.  Detection of Missing Proteins Using the PRIDE Database as a Source of Mass Spectrometry Evidence , 2016, Journal of proteome research.

[10]  John R Yates,et al.  Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate. , 2016, Journal of proteome research.

[11]  F. He,et al.  Deep Coverage Proteomics Identifies More Low-Abundance Missing Proteins in Human Testis Tissue with Q-Exactive HF Mass Spectrometer. , 2016, Journal of proteome research.

[12]  Lennart Martens,et al.  Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1. , 2016, Journal of proteome research.

[13]  Thibault Robin,et al.  Looking for Missing Proteins in the Proteome of Human Spermatozoa: An Update. , 2016, Journal of proteome research.

[14]  A. Bairoch,et al.  Missing Protein Landscape of Human Chromosomes 2 and 14: Progress and Current Status. , 2016, Journal of proteome research.

[15]  Chris Sander,et al.  Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome , 2016, Cell.

[16]  Tsippi Iny Stein,et al.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses , 2016, Current protocols in bioinformatics.

[17]  José A. Dianes,et al.  2016 update of the PRIDE database and its related tools , 2016, Nucleic Acids Res..

[18]  Amit Kumar Yadav,et al.  False Discovery Rate Estimation in Proteomics. , 2016, Methods in molecular biology.

[19]  Juan Antonio Vizcaíno,et al.  Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. , 2015, Journal of proteome research.

[20]  C. Pineau,et al.  Human Spermatozoa as a Model for Detecting Missing Proteins in the Context of the Chromosome-Centric Human Proteome Project. , 2015, Journal of proteome research.

[21]  David D. Shteynberg,et al.  State of the Human Proteome in 2014/2015 As Viewed through PeptideAtlas: Enhancing Accuracy and Coverage through the AtlasProphet. , 2015, Journal of proteome research.

[22]  Alain Gateau,et al.  Computational and Mass-Spectrometry-Based Workflow for the Discovery and Validation of Missing Human Proteins: Application to Chromosomes 2 and 14. , 2015, Journal of proteome research.

[23]  A. Pascual-Montano,et al.  Proteogenomics Dashboard for the Human Proteome Project. , 2015, Journal of proteome research.

[24]  F. Corrales,et al.  Prediction of a missing protein expression map in the context of the human proteome project. , 2015, Journal of proteome research.

[25]  S. Dhanasekaran,et al.  The landscape of long noncoding RNAs in the human transcriptome , 2015, Nature Genetics.

[26]  Nam Jin Yoo,et al.  Laminin gene LAMB4 is somatically mutated and expressionally altered in gastric and colorectal cancers , 2015, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[27]  Gary D Bader,et al.  Highlights of B/D‐HPP and HPP Resource Pillar Workshops at 12th Annual HUPO World Congress of Proteomics , 2014, Proteomics.

[28]  Patrick T. Goodbourn,et al.  Variants in the 1q21 risk region are associated with a visual endophenotype of autism and schizophrenia , 2014, Genes, brain, and behavior.

[29]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[30]  G. Pelosi,et al.  Olfactory receptor 51E1 as a novel target for diagnosis in somatostatin receptor-negative lung carcinoids. , 2013, Journal of molecular endocrinology.

[31]  Mathias Wilhelm,et al.  Global proteome analysis of the NCI-60 cell line panel. , 2013, Cell reports.

[32]  A. Tsolakis,et al.  Olfactory receptor 51E1 protein as a potential novel tissue biomarker for small intestine neuroendocrine carcinomas. , 2013, European journal of endocrinology.

[33]  Gary D Bader,et al.  The biology/disease-driven human proteome project (B/D-HPP): enabling protein research for the life sciences community. , 2013, Journal of proteome research.

[34]  J. Eng,et al.  Comet: An open‐source MS/MS sequence database search tool , 2013, Proteomics.

[35]  William S Hancock,et al.  Uniting ENCODE with genome-wide proteomics , 2012, Nature Biotechnology.

[36]  S. Hanash,et al.  Standard guidelines for the chromosome-centric human proteome project. , 2012, Journal of proteome research.

[37]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[38]  S. Hanash,et al.  The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome , 2012, Nature Biotechnology.

[39]  M. Sánchez-Martín,et al.  Identification and molecular characterization of the mammalian α-kleisin RAD21L , 2011, Cell cycle.

[40]  Cathy H. Wu,et al.  The Human Proteome Project: Current State and Future Direction , 2011, Molecular & Cellular Proteomics.

[41]  Ruixiang Sun,et al.  Open MS/MS spectral library search to identify unanticipated post-translational modifications and increase spectral identification rate , 2010, Bioinform..

[42]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[43]  H. Woo,et al.  Integrative Analysis of Proteomic Signatures, Mutations, and Drug Responsiveness in the NCI 60 Cancer Cell Line Set , 2010, Molecular Cancer Therapeutics.

[44]  Joshua E. Elias,et al.  Target-Decoy Search Strategy for Mass Spectrometry-Based Proteomics , 2010, Proteome Bioinformatics.

[45]  J. Buhmann,et al.  Protein Identification False Discovery Rates for Very Large Proteomics Data Sets Generated by Tandem Mass Spectrometry* , 2009, Molecular & Cellular Proteomics.

[46]  Norman W. Paton,et al.  Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines , 2009, Proteomics.

[47]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[48]  M. MacCoss,et al.  A fast SEQUEST cross correlation algorithm. , 2008, Journal of proteome research.

[49]  B. Searle,et al.  Improving sensitivity by probabilistically combining results from multiple MS/MS search methodologies. , 2008, Journal of proteome research.

[50]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[51]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[52]  K. Lunetta,et al.  Genome-wide association with select biomarker traits in the Framingham Heart Study , 2007, BMC Medical Genetics.

[53]  B. Balgley,et al.  Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy*S , 2007, Molecular & Cellular Proteomics.

[54]  Xiaoxiao Hu,et al.  PSGR2, a novel G‐protein coupled receptor, is overexpressed in human prostate cancer , 2006, International journal of cancer.

[55]  Robertson Craig,et al.  Open source system for analyzing, validating, and storing protein identification data. , 2004, Journal of proteome research.

[56]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[57]  Martin S. Taylor,et al.  The extracellular matrix gene Frem1 is essential for the normal adhesion of the embryonic epidermis. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[59]  S. Bryant,et al.  Open mass spectrometry search algorithm. , 2004, Journal of proteome research.

[60]  Bernhard Kuster,et al.  Profiling Core Proteomes of Human Cell Lines by One-dimensional PAGE and Liquid Chromatography-Tandem Mass Spectrometry*S , 2003, Molecular & Cellular Proteomics.

[61]  Andrew Emili,et al.  In silico proteome analysis to facilitate proteomics experiments using mass spectrometry , 2003, Proteome Science.

[62]  R. Beavis,et al.  A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. , 2003, Analytical chemistry.

[63]  A. Jauch,et al.  Concurrent activation of a novel putative transforming gene, myeov, and cyclin D1 in a subset of multiple myeloma cell lines with t(11;14)(q13;q32). , 2000, Blood.

[64]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.