An insight into high-resolution mass-spectrometry data.

Mass spectrometry is a powerful tool with much promise in global proteomic studies. The discipline of statistics offers robust methodologies to extract and interpret high-dimensional mass-spectrometry data and will be a valuable contributor to the field. Here, we describe the process by which data are produced, characteristics of the data, and the analytical preprocessing steps that are taken in order to interpret the data and use it in downstream statistical analyses. Because of the complexity of data acquisition, statistical methods developed for gene expression microarray data are not directly applicable to proteomic data. Areas in need of statistical research for proteomic data include alignment, experimental design, abundance normalization, and statistical analysis.

[1]  J. Yergey A GENERAL APPROACH TO CALCULATING ISOTOPIC DISTRIBUTIONS FOR MASS SPECTROMETRY. , 1983, Journal of mass spectrometry : JMS.

[2]  J. Yates,et al.  A model for random sampling and estimation of relative protein abundance in shotgun proteomics. , 2004, Analytical chemistry.

[3]  Jeffrey S. Morris,et al.  Feature extraction and quantification for mass spectrometry in biomedical applications using the mean spectrum , 2005, Bioinform..

[4]  M. Mann,et al.  What does it mean to identify a protein in proteomics? , 2002, Trends in biochemical sciences.

[5]  Harald Martens,et al.  Challenges related to analysis of protein spot volumes from two-dimensional gel electrophoresis as revealed by replicate gels. , 2006, Journal of proteome research.

[6]  Qunhua Li,et al.  Modes of inference for evaluating the confidence of peptide identifications. , 2008, Journal of proteome research.

[7]  E. Diamandis Mass Spectrometry as a Diagnostic and a Cancer Biomarker Discovery Tool , 2004, Molecular & Cellular Proteomics.

[8]  Tao Liu,et al.  Utilizing human blood plasma for proteomic biomarker discovery. , 2005, Journal of proteome research.

[9]  F. McLafferty,et al.  Automated reduction and interpretation of , 2000, Journal of the American Society for Mass Spectrometry.

[10]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[11]  Sudhir Srivastava,et al.  Proteomics in the forefront of cancer biomarker discovery. , 2005, Journal of proteome research.

[12]  Joshua E. Elias,et al.  Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. , 2003, Journal of proteome research.

[13]  F. McLafferty,et al.  Extending Top-Down Mass Spectrometry to Proteins with Masses Greater Than 200 Kilodaltons , 2006, Science.

[14]  B. Turnbull The Empirical Distribution Function with Arbitrarily Grouped, Censored, and Truncated Data , 1976 .

[15]  B. W. Wright,et al.  High-speed peak matching algorithm for retention time alignment of gas chromatographic data for chemometric analysis. , 2003, Journal of chromatography. A.

[16]  Alexey I Nesvizhskii,et al.  Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. , 2002, Analytical chemistry.

[17]  J. A. Taylor,et al.  Informatics for protein identification by mass spectrometry. , 2005, Methods.

[18]  A. Oberg,et al.  Statistical evaluation of internal and external mass calibration laws utilized in fourier transform ion cyclotron resonance mass spectrometry. , 2005, Analytical chemistry.

[19]  M. Senko,et al.  Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions , 1995, Journal of the American Society for Mass Spectrometry.

[20]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[21]  H. R. Bergen,et al.  Discovery of Ovarian Cancer Biomarkers in Serum Using NanoLC Electrospray Ionization TOF and FT-ICR Mass Spectrometry , 2004, Disease markers.

[22]  Edward L Huttlin,et al.  Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy. , 2007, Journal of proteome research.

[23]  Terry M Therneau,et al.  Statistical analysis of relative labeled mass spectrometry data from complex samples using ANOVA. , 2008, Journal of proteome research.

[24]  M. Mann,et al.  Precision proteomics: The case for high resolution and high mass accuracy , 2008, Proceedings of the National Academy of Sciences.

[25]  M. Wilkins,et al.  Optimal replication and the importance of experimental design for gel-based quantitative proteomics. , 2005, Journal of proteome research.

[26]  Hua Tang,et al.  Normalization Regarding Non-Random Missing Values in High-Throughput Mass Spectrometry Data , 2005, Pacific Symposium on Biocomputing.

[27]  Rovshan G Sadygov,et al.  Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book , 2004, Nature Methods.

[28]  D. Muddiman,et al.  A method for calculating 16o/18o peptide ion ratios for the relative quantification of proteomes , 2004, Journal of the American Society for Mass Spectrometry.

[29]  R. Aebersold,et al.  A statistical model for identifying proteins by tandem mass spectrometry. , 2003, Analytical chemistry.

[30]  Y K Wang,et al.  Inverse 18O labeling mass spectrometry for the rapid identification of marker/target proteins. , 2001, Analytical chemistry.

[31]  Xiang Zhang,et al.  Data pre-processing in liquid chromatography-mass spectrometry-based proteomics , 2005, Bioinform..

[32]  A. Rockwood,et al.  Ultrahigh-speed calculation of isotope distributions. , 1996, Analytical chemistry.

[33]  P. Højrup,et al.  Rapid identification of proteins by peptide-mass fingerprinting , 1993, Current Biology.

[34]  H. Cartwright,et al.  Application of fast Fourier transform cross-correlation for the alignment of large chromatographic and spectral datasets. , 2005, Analytical chemistry.

[35]  Zengyou He,et al.  Improving mass spectrometry peak detection using multiple peak alignment results. , 2008, Journal of proteome research.

[36]  J R Yates,et al.  Database searching using mass spectrometry data , 1998, Electrophoresis.

[37]  M. Baldwin Protein Identification by Mass Spectrometry , 2004, Molecular & Cellular Proteomics.

[38]  A. Marshall,et al.  Fourier transform ion cyclotron resonance mass spectrometry: a primer. , 1998, Mass spectrometry reviews.

[39]  M. Mann,et al.  The abc's (and xyz's) of peptide sequencing , 2004, Nature Reviews Molecular Cell Biology.

[40]  B. Cargile,et al.  Potential for false positive identifications from large databases through tandem mass spectrometry. , 2004, Journal of proteome research.

[41]  A. Makarov,et al.  The Orbitrap: a new mass spectrometer. , 2005, Journal of mass spectrometry : JMS.

[42]  I. Brewis The human plasma proteome , 2006 .

[43]  Hua Tang,et al.  A statistical method for chromatographic alignment of LC-MS data. , 2007, Biostatistics.

[44]  J BERKSON,et al.  Calculation of survival rates for cancer. , 1950, Proceedings of the staff meetings. Mayo Clinic.

[45]  Steven A Carr,et al.  Place of pattern in proteomic biomarker discovery. , 2005, Journal of proteome research.

[46]  Jeffrey S. Morris,et al.  The importance of experimental design in proteomic mass spectrometry experiments: some cautionary tales. , 2005, Briefings in functional genomics & proteomics.

[47]  N. Anderson,et al.  The Human Plasma Proteome , 2002, Molecular & Cellular Proteomics.

[48]  J. Yates,et al.  Shotgun Proteomics and Biomarker Discovery , 2002, Disease markers.

[49]  Ruedi Aebersold,et al.  Perspective: a program to improve protein biomarker discovery for cancer. , 2005, Journal of proteome research.

[50]  Robert Tibshirani,et al.  Sample classification from protein mass spectrometry, by 'peak probability contrasts' , 2004, Bioinform..

[51]  X. Yao,et al.  Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. , 2001, Analytical chemistry.

[52]  A. Marshall,et al.  Fourier Transform Ion Cyclotron Resonance Spectroscopy , 1974 .

[53]  Neil L. Kelleher,et al.  Peer Reviewed: Top-Down Proteomics , 2004 .

[54]  Lukas N. Mueller,et al.  An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. , 2008, Journal of proteome research.

[55]  H. R. Bergen,et al.  A Method for Automatically Interpreting Mass Spectra of 18O-Labeled Isotopic Clusters*S , 2007, Molecular & Cellular Proteomics.

[56]  Terry M. Therneau,et al.  Regression analysis for comparing protein samples with 16O/18O stable-isotope labeled mass spectrometry , 2006, Bioinform..

[57]  Stephen J. Callister,et al.  Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. , 2006, Journal of proteome research.

[58]  M. Gross,et al.  Space charge effects in Fourier transform mass spectrometry. Mass calibration. , 1984, Analytical chemistry.

[59]  Steven A Carr,et al.  Protein biomarker discovery and validation: the long and uncertain path to clinical utility , 2006, Nature Biotechnology.