PatternLab for proteomics: a tool for differential shotgun proteomics

BackgroundA goal of proteomics is to distinguish between states of a biological system by identifying protein expression differences. Liu et al. demonstrated a method to perform semi-relative protein quantitation in shotgun proteomics data by correlating the number of tandem mass spectra obtained for each protein, or "spectral count", with its abundance in a mixture; however, two issues have remained open: how to normalize spectral counting data and how to efficiently pinpoint differences between profiles. Moreover, Chen et al. recently showed how to increase the number of identified proteins in shotgun proteomics by analyzing samples with different MS-compatible detergents while performing proteolytic digestion. The latter introduced new challenges as seen from the data analysis perspective, since replicate readings are not acquired.ResultsTo address the open issues above, we present a program termed PatternLab for proteomics. This program implements existing strategies and adds two new methods to pinpoint differences in protein profiles. The first method, ACFold, addresses experiments with less than three replicates from each state or having assays acquired by different protocols as described by Chen et al. ACFold uses a combined criterion based on expression fold changes, the AC test, and the false-discovery rate, and can supply a "bird's-eye view" of differentially expressed proteins. The other method addresses experimental designs having multiple readings from each state and is referred to as nSVM (natural support vector machine) because of its roots in evolutionary computing and in statistical learning theory. Our observations suggest that nSVM's niche comprises projects that select a minimum set of proteins for classification purposes; for example, the development of an early detection kit for a given pathology. We demonstrate the effectiveness of each method on experimental data and confront them with existing strategies.ConclusionPatternLab offers an easy and unified access to a variety of feature selection and normalization strategies, each having its own niche. Additionally, graphing tools are available to aid in the analysis of high throughput experimental data. PatternLab is available at http://pcarvalho.com/patternlab.

[1]  T. Shaler,et al.  Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. , 2003, Analytical chemistry.

[2]  David L. Tabb,et al.  A proteomic view of the Plasmodium falciparum life cycle , 2002, Nature.

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  J. Yates,et al.  Large-scale analysis of the yeast proteome by multidimensional protein identification technology , 2001, Nature Biotechnology.

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  J. Yates,et al.  Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. , 1995, Analytical chemistry.

[7]  Francisco Azuaje,et al.  An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors , 2006, BMC Medical Informatics Decis. Mak..

[8]  N. Samatova,et al.  Detecting differential and correlated protein expression in label-free shotgun proteomics. , 2006, Journal of proteome research.

[9]  Jae K. Lee,et al.  Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays , 2003, Bioinform..

[10]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[11]  Nasreen S Jessani,et al.  A streamlined platform for high-content functional proteomics of primary human specimens , 2005, Nature Methods.

[12]  John R Yates,et al.  Optimization of mass spectrometry-compatible surfactants for shotgun proteomics. , 2007, Journal of proteome research.

[13]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..

[14]  Jean-Jacques Daudin,et al.  Determination of the differentially expressed genes in microarray experiments using local FDR , 2004, BMC Bioinformatics.

[15]  Nelson Spector,et al.  Differential protein expression patterns obtained by mass spectrometry can aid in the diagnosis of Hodgkin's disease. , 2007, Journal of experimental therapeutics & oncology.

[16]  J. X. Pang,et al.  Biomarker discovery in urine by proteomics. , 2002, Journal of proteome research.

[17]  W. Teahan,et al.  Experiments on the zero frequency problem , 1995, Proceedings DCC '95 Data Compression Conference.

[18]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[19]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[20]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[21]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[22]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[23]  Jean Yee Hwa Yang,et al.  Gene expression Identifying differentially expressed genes from microarray experiments via statistic synthesis , 2005 .

[24]  J. Yates,et al.  DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. , 2002, Journal of proteome research.

[25]  Jan M. Van Campenhout,et al.  On the Possible Orderings in the Measurement Selection Problem , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[26]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[27]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[28]  Matej Oresic,et al.  Processing methods for differential analysis of LC/MS profile data , 2005, BMC Bioinformatics.

[29]  J. Yates,et al.  Performance of a linear ion trap-Orbitrap hybrid for peptide analysis. , 2006, Analytical chemistry.

[30]  K. Becker,et al.  Analysis of microarray data using Z score transformation. , 2003, The Journal of molecular diagnostics : JMD.

[31]  Bernhard Schölkopf,et al.  Feature Selection for Support Vector Machines Using Genetic Algorithms , 2004, Int. J. Artif. Intell. Tools.

[32]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[33]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.