NORMALIZATION , BASELINE CORRECTION AND ALIGNMENT OF HIGH-THROUGHPUT MASS SPECTROMETRY DATA

We propose several preprocessing steps to be used before biomarker clustering or classifying for high-throughput Mass Spectrometry (MS) data. These preprocessing steps for the mass spectra are multiple alignment of technical replicates, baseline correction and normalization along the mass/charge axis. While the benefits from baseline correction and alignment seem obvious we studied more carefully the benefit from normalizing using some human prostate cancer SELDI TOF MS data (obtained from the Virginia Prostate Center Tissue and body Fluid Bank and approved by the Eastern Virginia Medical School). We show on these data that our global normalization by scaling helps in distinguishing between different cancer groups as well as between cancer and non-cancer groups. We used the Between to Within sum of squares ratio introduced by Fisher as well as visual inspection to illustrate the improvement brought by the normalization.

[1]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[2]  M S Pepe,et al.  Phases of biomarker development for early detection of cancer. , 2001, Journal of the National Cancer Institute.

[3]  Hongyu Zhao,et al.  Statistical Issues in Using Mass Spectra for Disease Classification , 2004 .

[4]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[5]  Jesús Angulo,et al.  Automatic analysis of DNA microarray images using mathematical morphology , 2003, Bioinform..

[6]  Jeffrey S. Morris,et al.  Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. , 2003, Clinical chemistry.

[7]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[8]  K. Markides,et al.  Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography-mass spectrometry data. , 2002, Journal of chromatography. A.

[9]  E. Fung,et al.  ProteinChip clinical proteomics: computational challenges and solutions. , 2002, BioTechniques.

[10]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[11]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[12]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[13]  Gary Siuzdak,et al.  Mass spectrometry for biotechnology , 1996 .