Novel Approaches to Visualization and Data Mining Reveals Diagnostic Information in the Low Amplitude Region of Serum Mass Spectra from Ovarian Cancer Patients

The ability to identify patterns of diagnostic signatures in proteomic data generated by high throughput mass spectrometry (MS) based serum analysis has recently generated much excitement and interest from the scientific community. These data sets can be very large, with high-resolution MS instrumentation producing 1–2 million data points per sample. Approaches to analyze mass spectral data using unsupervised and supervised data mining operations would greatly benefit from tools that effectively allow for data reduction without losing important diagnostic information. In the past, investigators have proposed approaches where data reduction is performed by a priori “peak picking” and alignment/warping/smoothing components using rule-based signal-to-noise measurements. Unfortunately, while this type of system has been employed for gene microarray analysis, it is unclear whether it will be effective in the analysis of mass spectral data, which unlike microarray data, is comprised of continuous measurement operations. Moreover, it is unclear where true signal begins and noise ends. Therefore, we have developed an approach to MS data analysis using new types of data visualization and mining operations in which data reduction is accomplished by culling via the intensity of the peaks themselves instead of by location. Applying this new analysis method on a large study set of high resolution mass spectra from healthy and ovarian cancer patients, shows that all of the diagnostic information is contained within the very lowest amplitude regions of the mass spectra. This region can then be selected and studied to identify the exact location and amplitude of the diagnostic biomarkers.

[1]  A. Vlahou,et al.  A novel approach toward development of a rapid blood test for breast cancer. , 2003, Clinical breast cancer.

[2]  D. J. Spiegelhalter,et al.  Statistical and Knowledge‐Based Approaches to Clinical Decision‐Support Systems, with an Application in Gastroenterology , 1984 .

[3]  Emanuel F. Petricoin,et al.  Biomarker Amplification by Serum Carrier Protein Binding , 2004, Disease markers.

[4]  Barry G. Becker,et al.  Volume rendering for relational data , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.

[5]  Ming Zhou,et al.  Cancer diagnosis using proteomic patterns , 2003, Expert review of molecular diagnostics.

[6]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[7]  P. Schellhammer,et al.  Boosted decision tree analysis of surface-enhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. , 2002, Clinical chemistry.

[8]  Jeffrey S. Morris,et al.  A comprehensive approach to the analysis of matrix‐assisted laser desorption/ionization‐time of flight proteomics spectra from serum samples , 2003, Proteomics.

[9]  D. Chan,et al.  Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. , 2002, Clinical chemistry.

[10]  E. Petricoin,et al.  Serum proteomic patterns for detection of prostate cancer. , 2002, Journal of the National Cancer Institute.

[11]  P. Schellhammer,et al.  Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. , 2002, Cancer research.

[12]  Jeffrey S. Morris,et al.  Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. , 2003, Clinical chemistry.