Normalization and missing value imputation for label-free LC-MS analysis

Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data.

[1]  John D. Storey,et al.  Lymphocyte Anergy in Patients with Carcinoma , 1973, British Journal of Cancer.

[2]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[3]  John D. Storey,et al.  Supervised normalization of microarrays , 2010, Bioinform..

[4]  Navdeep Jaitly,et al.  DAnTE: a statistical tool for quantitative analysis of -omics data , 2008, Bioinform..

[5]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  J. Yates,et al.  An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database , 1994, Journal of the American Society for Mass Spectrometry.

[7]  D. N. Perkins,et al.  Probability‐based protein identification by searching sequence databases using mass spectrometry data , 1999, Electrophoresis.

[8]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[9]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[10]  M. Crawley Mixed‐Effects Models , 2007 .

[11]  Guy N. Brock,et al.  Biological impact of missing-value imputation on downstream analyses of gene expression profiles , 2011, Bioinform..

[12]  Jonathan Pevsner,et al.  Local mean normalization of microarray element signal intensities across an array surface: quality control and correction of spatially systematic artifacts. , 2002, BioTechniques.

[13]  Sanjit K. Mitra,et al.  Optimized LOWESS normalization parameter selection for DNA microarray data , 2004, BMC Bioinformatics.

[14]  Hua Tang,et al.  Normalization Regarding Non-Random Missing Values in High-Throughput Mass Spectrometry Data , 2005, Pacific Symposium on Biocomputing.

[15]  Jianhua Huang,et al.  A statistical framework for protein quantitation in bottom-up MS-based proteomics , 2009, Bioinform..

[16]  Alan R. Dabney,et al.  Elimination of systematic mass measurement errors in liquid chromatography-mass spectrometry based proteomics using regression models and a priori partial knowledge of the sample content. , 2008, Analytical chemistry.

[17]  John D. Storey,et al.  A new approach to intensity-dependent normalization of two-channel microarrays. , 2007, Biostatistics.

[18]  Richard D. Smith,et al.  Proteomic analyses using an accurate mass and time tag strategy. , 2004, BioTechniques.

[19]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[20]  Chris Chatfield,et al.  19. Statistical Analysis with Missing Data , 1988 .

[21]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[22]  Joshua N. Adkins,et al.  Normalization of peak intensities in bottom-up MS-based proteomics using singular value decomposition , 2009, Bioinform..

[23]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[24]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[25]  Jeffrey T Leek,et al.  On the design and analysis of gene expression studies in human populations , 2007, Nature Genetics.

[26]  Robertson Craig,et al.  TANDEM: matching proteins with tandem mass spectra. , 2004, Bioinformatics.

[27]  T. Therneau,et al.  A statistical model for iTRAQ data analysis. , 2008, Journal of proteome research.

[28]  Stephen J. Callister,et al.  Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. , 2006, Journal of proteome research.

[29]  Hongyu Zhao,et al.  Bayesian Analysis of iTRAQ Data with Nonrandom Missingness: Identification of Differentially Expressed Proteins , 2009, Statistics in biosciences.

[30]  Lee H. Dicker,et al.  Increased Power for the Analysis of Label-free LC-MS/MS Proteomics Data by Combining Spectral Counts and Peptide Peak Attributes* , 2010, Molecular & Cellular Proteomics.