Comparative analysis of different label-free mass spectrometry based protein abundance estimates and their correlation with RNA-Seq gene expression data.

An increasing number of studies involve integrative analysis of gene and protein expression data taking advantage of new technologies such as next-generation transcriptome sequencing (RNA-Seq) and highly sensitive mass spectrometry (MS) instrumentation. Thus, it becomes interesting to revisit the correlative analysis of gene and protein expression data using more recently generated data sets. Furthermore, within the proteomics community there is a substantial interest in comparing the performance of different label-free quantitative proteomic strategies. Gene expression data can be used as an indirect benchmark for such protein-level comparisons. In this work we use publicly available mouse data to perform a joint analysis of genomic and proteomic data obtained on the same organism. First, we perform a comparative analysis of different label-free protein quantification methods (intensity based and spectral count based and using various associated data normalization steps) using several software tools on the proteomic side. Similarly, we perform correlative analysis of gene expression data derived using microarray and RNA-Seq methods on the genomic side. We also investigate the correlation between gene and protein expression data, and various factors affecting the accuracy of quantitation at both levels. It is observed that spectral count based protein abundance metrics, which are easy to extract from any published data, are comparable to intensity based measures with respect to correlation with gene expression data. The results of this work should be useful for designing robust computational pipelines for extraction and joint analysis of gene and protein expression data in the context of integrative studies.

[1]  Benjamin Thomas,et al.  Comparative evaluation of label‐free SINQ normalized spectral index quantitation in the central proteomics facilities pipeline , 2011, Proteomics.

[2]  K. Resing,et al.  Comparison of Label-free Methods for Quantifying Human Proteins by Shotgun Proteomics*S , 2005, Molecular & Cellular Proteomics.

[3]  S. Carr,et al.  A Mitochondrial Protein Compendium Elucidates Complex I Disease Biology , 2008, Cell.

[4]  Brendan MacLean,et al.  General framework for developing and evaluating database scoring algorithms using the TANDEM search engine , 2006, Bioinform..

[5]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[6]  M. Selbach,et al.  Global quantification of mammalian gene expression control , 2011, Nature.

[7]  M. Mann,et al.  Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast , 2008, Nature.

[8]  R. Aebersold,et al.  Mass Spectrometry and Protein Analysis , 2006, Science.

[9]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[10]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[11]  Charles Darwin,et al.  Experiments , 1800, The Medical and physical journal.

[12]  S. Le,et al.  Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line , 2010, Molecular systems biology.

[13]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[14]  M. Washburn,et al.  Quantitative shotgun proteomics using a protease with broad specificity and normalized spectral abundance factors. , 2007, Molecular bioSystems.

[15]  H. Christofk,et al.  A label‐free quantification method by MS/MS TIC compared to SILAC and spectral counting in a proteomics screen , 2008, Proteomics.

[16]  A. Nesvizhskii A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. , 2010, Journal of proteomics.

[17]  Kang Ning,et al.  The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment , 2010, BMC Bioinformatics.

[18]  Natalie I. Tasman,et al.  A guided tour of the Trans‐Proteomic Pipeline , 2010, Proteomics.

[19]  Robert W. Williams,et al.  Genome-Wide Gene Expression Profiling of Nucleus Accumbens Neurons Projecting to Ventral Pallidum Using both Microarray and Transcriptome Sequencing , 2011, Front. Neurosci..

[20]  M. Mann,et al.  Defining the transcriptome and proteome in three functionally different human cell lines , 2010, Molecular systems biology.

[21]  M. Gorenstein,et al.  Absolute Quantification of Proteins by LCMSE , 2006, Molecular & Cellular Proteomics.

[22]  Leming Shi,et al.  Comparing next-generation sequencing and microarray technologies in a toxicological study of the effects of aristolochic acid on rat kidneys. , 2011, Chemical research in toxicology.

[23]  E. Marcotte,et al.  Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation , 2007, Nature Biotechnology.

[24]  Michelle S. Scott,et al.  Global Survey of Organ and Organelle Protein Expression in Mouse: Combined Proteomic and Transcriptomic Profiling , 2006, Cell.

[25]  E. Winzeler,et al.  Protein pathway and complex clustering of correlated mRNA and protein expression analyses in Saccharomyces cerevisiae , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Norman Pavelka,et al.  Delayed Correlation of mRNA and Protein Expression in Rapamycin-treated Cells and a Role for Ggc1 in Cellular Sensitivity to Rapamycin* , 2009, Molecular & Cellular Proteomics.

[27]  Harkamal Walia,et al.  Protein abundances are more conserved than mRNA abundances across diverse taxa , 2010, Proteomics.

[28]  A. Gasch,et al.  Molecular Systems Biology Peer Review Process File a Dynamic Model of Proteome Changes Reveals New Roles for Transcript Alteration in Yeast Transaction Report , 2022 .

[29]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[30]  Ruedi Aebersold,et al.  Mass spectrometry based targeted protein quantification: methods and applications. , 2009, Journal of proteome research.

[31]  Lukas N. Mueller,et al.  An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. , 2008, Journal of proteome research.

[32]  Mark Gerstein,et al.  Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts , 2002, Bioinform..

[33]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Lukas N. Mueller,et al.  SuperHirn – a novel tool for high resolution LC‐MS‐based peptide/protein profiling , 2007, Proteomics.

[35]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[36]  Michael K. Coleman,et al.  Correlation of relative abundance ratios derived from peptide ion chromatograms and spectrum counting for quantitative proteomic analysis using stable isotope labeling. , 2005, Analytical chemistry.

[37]  Chih-Chiang Tsou,et al.  IDEAL-Q, an Automated Tool for Label-free Quantitation Analysis Using an Efficient Peptide Alignment Approach and Spectral Data Validation* , 2009, Molecular & Cellular Proteomics.

[38]  Gang Wu,et al.  Correlation of mRNA Expression and Protein Abundance Affected by Multiple Sequence Features Related to Translational Efficiency in Desulfovibrio vulgaris: A Quantitative Analysis , 2006, Genetics.

[39]  Nichole L. King,et al.  Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry , 2004, Genome Biology.

[40]  Alexey I Nesvizhskii,et al.  Interpretation of Shotgun Proteomic Data , 2005, Molecular & Cellular Proteomics.

[41]  Christian von Mering,et al.  Shotgun proteomics data from multiple organisms reveals remarkable quantitative conservation of the eukaryotic core proteome , 2010, Proteomics.

[42]  Linfeng Wu,et al.  Role of spectral counting in quantitative proteomics , 2010, Expert review of proteomics.

[43]  Ning Zhang,et al.  MS-BID: a Java package for label-free LC-MS-based comparative proteomic analysis , 2008, Bioinform..

[44]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[45]  E. Marcotte,et al.  Global signatures of protein and mRNA expression levelsw , 2009 .

[46]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[47]  James A Hill,et al.  ProteomeCommons.org collaborative annotation and project management resource integrated with the Tranche repository. , 2010, Journal of proteome research.

[48]  Jimmy Eng,et al.  A platform for accurate mass and time analyses of mass spectrometry data. , 2007, Journal of proteome research.

[49]  Alexey I Nesvizhskii,et al.  Analysis and validation of proteomic data generated by tandem mass spectrometry , 2007, Nature Methods.

[50]  C. Vogel Translation's coming of age , 2011, Molecular Systems Biology.

[51]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[52]  Michelle S. Scott,et al.  A Quantitative Spatial Proteomics Analysis of Proteome Turnover in Human Cells* , 2011, Molecular & Cellular Proteomics.

[53]  Karin Hansson,et al.  Generic workflow for quality assessment of quantitative label‐free LC‐MS analysis , 2011, Proteomics.

[54]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.