Analysis of Medical Data Using Dimensionality Reduction Techniques

High-dimensional data can be dicult to analyze, almost impossible to visualize, and expensive to process and store. In many cases, the high-dimensional data points may all lie on or close to a much lower-dimensional surface, or manifold, implying the intrinsic dimensionality of the data is much lower. In that case, the data could be described with fewer dimensions, allowing us to mitigate the curse of dimensionality. Transforming the highdimensional representation of the data to a lower-dimensional one without losing important information is the central problem of dimensionality reduction. Many methods of dimensionality reduction have been developed, including classical techniques like Principal Component Analysis (PCA) and newer methods such as Diusion Maps (DM). Most of these methods often perform well on some types of data but poorly on others. We apply dierent dimensionality reduction methods to medical data, including breast tissue tumor data and kidney proteomics data, in order to determine which methods and parameters work best on on the dierent types of data. To evaluate the performance of the reduction method, we also classify the data in the reduced dimension using standard classication algorithms and evaluate the accuracy.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Ávila Herrera,et al.  Mixed Integer Linear Programming Based Implementations of Logical Analysis of Data and Its Applications. , 2013 .

[3]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[4]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[5]  Ieee Staff OCEANS 2009, MTS/IEEE Biloxi - Marine Technology for Our Future: Global and Local Challenges , 2009 .

[6]  K. Thangavel,et al.  Dimensionality reduction based on rough set theory: A review , 2009, Appl. Soft Comput..

[7]  G. Eknoyan,et al.  Prevalence of chronic kidney disease and decreased kidney function in the adult US population: Third National Health and Nutrition Examination Survey. , 2003, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[8]  Lindsay I. Smith,et al.  A tutorial on Principal Components Analysis , 2002 .

[9]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  E. Kreyszig,et al.  Advanced Engineering Mathematics. , 1974 .

[11]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[12]  Daniel Thalmann,et al.  Planar arrangement of high-dimensional biomedical data sets by isomap coordinates , 2003, 16th IEEE Symposium Computer-Based Medical Systems, 2003. Proceedings..

[13]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[14]  Erkki Oja,et al.  A class of neural networks for independent component analysis , 1997, IEEE Trans. Neural Networks.

[15]  Bernhard Schölkopf,et al.  A kernel view of the dimensionality reduction of manifolds , 2004, ICML.

[16]  Melanie Hilario,et al.  On Preprocessing of SELDI-MS Data and its Evaluation , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[17]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[18]  Melanie Hilario,et al.  Approaches to dimensionality reduction in proteomic biomarker studies , 2007, Briefings Bioinform..

[19]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[20]  Jonathon Shlens,et al.  A Tutorial on Principal Component Analysis , 2014, ArXiv.

[21]  Jason C. Isaacs,et al.  Diffusion map kernel analysis for target classification , 2009, OCEANS 2009.