Missing values in gel‐based proteomics

Gel‐based proteomics is a widely applied technique to measure abundances of proteins in various biological systems. Comparison of two or more biological groups involves matching of 2‐D gels. Depending on the software, this can result in spots showing missing values on several gels. Most studies ignore this fact or substitute all missing data by zero. Since a couple of years, scientists have realized that this is not the optimal way of analyzing their data and several studies were published presenting methods of imputing missing proteomics data. Most of these methods have already been applied to microarray data before; the phenomenon of missing data is well known in this field, too. With this review, we intend to further raise awareness of the problem of missing values in gel‐based proteomics. We summarize reasons for missing values and explore their distribution in data sets. We also provide a comparison and evaluation of hitherto proposed imputation methods for gel‐based proteomics data.

[1]  Andrew J Racher,et al.  On the statistical analysis of the GS-NS0 cell proteome: imputation, clustering and variability testing. , 2006, Biochimica et biophysica acta.

[2]  M. Mann,et al.  Proteomics to study genes and genomes , 2000, Nature.

[3]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[4]  Helen Kim,et al.  The case for well-conducted experiments to validate statistical protocols for 2D gels: different pre-processing = different lists of significant proteins , 2005, BMC biotechnology.

[5]  J Margolis,et al.  Isoelectric focusing and gradient gel electrophoresis: a two-dimensional technique. , 1970, Analytical biochemistry.

[6]  J. Bernhardt,et al.  Using standard positions and image fusion to create proteome maps from collections of two‐dimensional gel electrophoresis images , 2003, Proteomics.

[7]  Arlan Richardson,et al.  Processing of data generated by 2-dimensional gel electrophoresis for statistical analysis: missing data, normalization, and statistics. , 2004, Journal of proteome research.

[8]  Reinhard Guthke,et al.  Integrative analysis of the heat shock response in Aspergillus fumigatus , 2010, BMC Genomics.

[9]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[10]  Marco Grzegorczyk,et al.  Statistics for Proteomics: A Review of Tools for Analyzing Experimental Data , 2006, Proteomics.

[11]  Kathryn S Lilley,et al.  Comparison of DIGE and post‐stained gel electrophoresis with both traditional and SameSpots analysis for quantitative proteomics , 2008, Proteomics.

[12]  Joachim Klose,et al.  Two‐dimensional electrophoresis of proteins: An updated protocol and implications for a functional analysis of the genome , 1995, Electrophoresis.

[13]  F. Villers,et al.  Statistics for proteomics: experimental design and 2-DE differential analysis. , 2007, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[14]  Kathryn S. Lilley,et al.  DNA microarray normalization methods can remove bias from differential protein expression analysis of 2D difference gel electrophoresis results , 2004, Bioinform..

[15]  Allen G. Rodrigo,et al.  A Statistical Model to Identify Differentially Expressed Proteins in 2D PAGE Gels , 2009, PLoS Comput. Biol..

[16]  J. Klose Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues , 1975, Humangenetik.

[17]  David O. Nelson,et al.  Statistical challenges in the analysis of two-dimensional difference gel electrophoresis experiments using DeCyderTM , 2005, Bioinform..

[18]  Harald Martens,et al.  Challenges related to analysis of protein spot volumes from two-dimensional gel electrophoresis as revealed by replicate gels. , 2006, Journal of proteome research.

[19]  David P. Kreil,et al.  Determining a significant change in protein expression with DeCyder™ during a pair‐wise comparison using two‐dimensional difference gel electrophoresis , 2004, Proteomics.

[20]  Barbara Sitek,et al.  ANALYSIS OF DYNAMIC PROTEIN EXPRESSION DATA , 2005 .

[21]  Romesh Stanislaus,et al.  Normalization and analysis of residual variation in two‐dimensional gel electrophoresis for quantitative differential proteomics , 2005, Proteomics.

[22]  John Wood,et al.  A likelihood-based approach to defining statistical significance in proteomic analysis where missing data cannot be disregarded , 2004, Signal Process..

[23]  Reinhard Guthke,et al.  Discovery of Gene Regulatory Networks in Aspergillus fumigatus , 2006, KDECB.

[24]  Danh V. Nguyen,et al.  Evaluation of Missing Value Estimation for Microarray Data , 2004, Journal of Data Science.

[25]  M. Wilkins,et al.  Optimal replication and the importance of experimental design for gel-based quantitative proteomics. , 2005, Journal of proteome research.

[26]  P. O’Farrell High resolution two-dimensional electrophoresis of proteins. , 1975, The Journal of biological chemistry.

[27]  Matthias Berth,et al.  The state of the art in the analysis of two-dimensional gel electrophoresis images , 2007, Applied Microbiology and Biotechnology.

[28]  Reinhard Guthke,et al.  Integration of Transcriptome and Proteome Data from Human-Pathogenic Fungi by Using a Data Warehouse , 2007, J. Integr. Bioinform..

[29]  François Chevenet,et al.  The pitfalls of proteomics experiments without the correct use of bioinformatics tools , 2006, Proteomics.

[30]  Barbara Sitek,et al.  STATISTICAL EVALUATION OF METHODS FOR THE ANALYSIS OF DYNAMIC PROTEIN EXPRESSION DATA FROM A TUMOR STUDY , 2006 .

[31]  J. Robben,et al.  Treatment of missing values for multivariate statistical analysis of gel‐based proteomics data , 2008, Proteomics.

[32]  Peter James,et al.  A probabilistic treatment of the missing spot problem in 2D gel electrophoresis experiments. , 2007, Journal of proteome research.