A General Framework for Dimensionality Reduction for Large Data Sets

With electronic data increasing dramatically in almost all areas of research, a plethora of new techniques for automatic dimensionality reduction and data visualization has become available in recent years. These offer an interface which allows humans to rapidly scan through large volumes of data. With data sets becoming larger and larger, however, the standard methods can no longer be applied directly. Random subsampling or prior clustering still being one of the most popular solutions in this case, we discuss a principled alternative and formalize the approaches under a general perspectives of dimensionality reduction as cost optimization. We have a first look at the question whether these techniques can be accompanied by theoretical guarantees.

[1]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[2]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[3]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[4]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[5]  Miguel Á. Carreira-Perpiñán,et al.  The Elastic Embedding Algorithm for Dimensionality Reduction , 2010, ICML.

[6]  Barbara Hammer,et al.  On the effect of clustering on quality assessment measures for dimensionality reduction , 2010, NIPS 2010.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[9]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[10]  Pak Chung Wong,et al.  Guest Editor's Introduction: Visual Data Mining , 1999, IEEE Computer Graphics and Applications.

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[15]  Jarkko Venna,et al.  Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization , 2010, J. Mach. Learn. Res..

[16]  Daniel A. Keim,et al.  Visual Analytics: Scope and Challenges , 2008, Visual Data Mining.

[17]  Axel Wismüller,et al.  Adaptive local dissimilarity measures for discriminative dimension reduction of labeled data , 2010, Neurocomputing.

[18]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[19]  Yee Whye Teh,et al.  Automatic Alignment of Local Representations , 2002, NIPS.

[20]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.