Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis

Flow cytometry is often used to characterize the malignant cells in leukemia and lymphoma patients, traced to the level of the individual cell. Typically, flow-cytometric data analysis is performed through a series of 2-D projections onto the axes of the data set. Through the years, clinicians have determined combinations of different fluorescent markers which generate relatively known expression patterns for specific subtypes of leukemia and lymphoma - cancers of the hematopoietic system. By only viewing a series of 2-D projections, the high-dimensional nature of the data is rarely exploited. In this paper we present a means of determining a low-dimensional projection which maintains the high-dimensional relationships (i.e., information distance) between differing oncological data sets. By using machine learning techniques, we allow clinicians to visualize data in a low dimension defined by a linear combination of all of the available markers, rather than just two at a time. This provides an aid in diagnosing similar forms of cancer, as well as a means for variable selection in exploratory flow-cytometric research. We refer to our method as information preserving component analysis (IPCA).

[1]  N. Chiorazzi,et al.  Ig V gene mutation status and CD38 expression as novel prognostic indicators in chronic lymphocytic leukemia. , 1999, Blood.

[2]  Raviv Raich,et al.  Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: Treating flow cytometry data as high‐dimensional objects , 2009, Cytometry. Part B, Clinical cytometry.

[3]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[4]  G. Terrell The Maximal Smoothing Principle in Density Estimation , 1990 .

[5]  Scott C. Douglas On the Design of Gradient Algorithms Employing Orthogonal Matrix Constraints , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[6]  Luis Rueda,et al.  A New Linear Dimensionality Reduction Technique Based on Chernoff Distance , 2006, IBERAMIA-SBIA.

[7]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[8]  Alfred O. Hero,et al.  Fine: Information embedding for document classification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  W. Finn,et al.  Unsupervised immunophenotypic profiling of chronic lymphocytic leukemia , 2006, Cytometry. Part B, Clinical cytometry.

[10]  M. Roederer,et al.  Probability binning comparison: a metric for quantitating univariate distribution differences. , 2001, Cytometry.

[11]  Chia-Wei Hsu,et al.  A Linear Feature Extraction for Multiclass Classification Problems Based on Class Mean and Covariance Discriminant Information , 2006, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[13]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[14]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference , 1997 .

[15]  R C Mann,et al.  The use of projections for dimensionality reduction of flow cytometric data. , 1984, Cytometry.

[16]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[17]  M Roederer,et al.  Frequency difference gating: a multivariate method for identifying subsets that differ between samples. , 2001, Cytometry.

[18]  Alfred O. Hero,et al.  FINE: Fisher Information Nonparametric Embedding , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference: Kass/Geometrical , 1997 .

[20]  L. Picker,et al.  Immunophenotypic analysis of hematogones (B-lymphocyte precursors) in 662 consecutive bone marrow specimens by 4-color flow cytometry. , 2001, Blood.

[21]  M Roederer,et al.  Probability binning comparison: a metric for quantitating multivariate distribution differences. , 2001, Cytometry.

[22]  J. Friedman Regularized Discriminant Analysis , 1989 .

[23]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[24]  Qing Zeng-Treitler,et al.  Feature-guided clustering of multi-dimensional flow cytometry datasets , 2007, J. Biomed. Informatics.

[25]  R C Mann,et al.  On multiparameter data analysis in flow cytometry. , 1987, Cytometry.

[26]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[27]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[28]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[29]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[30]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[31]  A. Hero,et al.  An Information Geometric Framework for Dimensionality Reduction , 2008, 0809.4866.

[32]  Benjamin Geiger,et al.  Resolving and classifying haematopoietic bone‐marrow cell populations by multi‐dimensional analysis of flow‐cytometry data , 2005, British journal of haematology.

[33]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .