Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments

BackgroundQuality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important.ResultsWe propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis.ConclusionWe evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.

[1]  W. Windig,et al.  A Noise and Background Reduction Method for Component Detection in Liquid Chromatography/Mass Spectrometry , 1996 .

[2]  Brendt Wohlberg,et al.  Incorporating invariants in Mahalanobis distance based classifiers: application to face recognition , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[3]  I. Eidhammer,et al.  Improving the reliability and throughput of mass spectrometry‐based proteomics by spectrum quality filtering , 2006, Proteomics.

[4]  S. Vernon,et al.  A method for improving SELDI-TOF mass spectrometry data quality , 2007, Proteome Science.

[5]  R. Aebersold,et al.  Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data , 2006, Molecular & Cellular Proteomics.

[6]  M. K. Young,et al.  Method for screening peptide fragment ion mass spectra prior to database searching , 2000, Journal of the American Society for Mass Spectrometry.

[7]  Eunok Paek,et al.  Quality assessment of tandem mass spectra based on cumulative intensity normalization. , 2006, Journal of proteome research.

[8]  Marshall W. Bern,et al.  Automatic Quality Assessment of Peptide Tandem Mass Spectra , 2004, ISMB/ECCB.

[9]  Keng Wah Choo,et al.  Tandem mass spectrometry data quality assessment by self-convolution , 2007, BMC Bioinformatics.

[10]  C. Croux,et al.  Generalizing univariate signed rank statistics for testing and estimating a multivariate location parameter , 1995 .

[11]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[12]  J. Listgarten,et al.  Statistical and Computational Methods for Comparative Proteomic Profiling Using Liquid Chromatography-Tandem Mass Spectrometry , 2005, Molecular & Cellular Proteomics.

[13]  Egidijus Machtejevas,et al.  Monolithic silica columns of various format in automated sample clean-up/multidimensional liquid chromatography/mass spectrometry for peptidomics. , 2007, Journal of chromatography. A.

[14]  Alun D. Preece,et al.  Information quality in proteomics , 2007, Briefings Bioinform..

[15]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[16]  Benno Schwikowski,et al.  Assessing Bias in Experiment Design for Large Scale Mass Spectrometry-based Quantitative Proteomics*S , 2007, Molecular & Cellular Proteomics.

[17]  Jeffrey S. Morris,et al.  Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. , 2003, Clinical chemistry.

[18]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[19]  Ruben H. Zamar,et al.  MDQC: a new quality assessment method for microarrays based on quality control reports , 2007, Bioinform..

[20]  Fabian Model,et al.  Statistical process control for large scale microarray experiments , 2002, ISMB.

[21]  S. Bryant,et al.  Assessing data quality of peptide mass spectra obtained by quadrupole ion trap mass spectrometry. , 2005, Journal of proteome research.

[22]  Knut Reinert,et al.  LC-MSsim – a simulation software for liquid chromatography mass spectrometry data , 2008, BMC Bioinformatics.

[23]  Knut Reinert,et al.  TOPP - the OpenMS proteomics pipeline , 2007, Bioinform..

[24]  Jacob D. Jaffe,et al.  MapQuant: Open‐source software for large‐scale protein quantification , 2006, Proteomics.

[25]  Xihong Lin,et al.  Quantitative quality-assessment techniques to compare fractionation and depletion methods in SELDI-TOF mass spectrometry experiments , 2007, Bioinform..

[26]  Fredrik Levander,et al.  Wavelet-based method for noise characterization and rejection in high-performance liquid chromatography coupled to mass spectrometry. , 2008, Analytical chemistry.

[27]  Knut Reinert,et al.  Absolute myoglobin quantitation in serum by combining two-dimensional liquid chromatography-electrospray ionization mass spectrometry and novel data analysis algorithms. , 2006, Journal of proteome research.

[28]  Knut Reinert,et al.  OpenMS – An open-source software framework for mass spectrometry , 2008, BMC Bioinformatics.

[29]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .

[30]  P. Sorger,et al.  Image metrics in the statistical analysis of DNA microarray data , 2001, Proceedings of the National Academy of Sciences of the United States of America.