Heterogeneous Datasets Representation and Learning using Diffusion Maps and Laplacian Pyramids

The diffusion maps together with the geometric harmonics provide a method for describing the geometry of high dimensional data and for extending these descriptions to new data points and to functions, which are defined on the data. This method suffers from two limitations. First, even though real-life data is often heterogeneous , the assumption in diffusion maps is that the attributes of the processed dataset are comparable. Second, application of the geometric harmonics requires careful setting for the correct extension scale and condition number. In this paper, we propose a method for representing and learning heterogeneous datasets by using diffusion maps for unifying and embedding heterogeneous dataset and by replacing the geometric harmonics with the Laplacian pyramid extension. Experimental results on three benchmark datasets demonstrate how the learning process becomes straightforward when the constructed representation smoothly parameterizes the task-related function.

[1]  Charles K. Chui,et al.  Special issue on diffusion maps and wavelets , 2006 .

[2]  Jitendra Malik,et al.  Efficient spatiotemporal grouping using the Nystrom method , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Meirav Galun,et al.  Fundamental Limitations of Spectral Clustering , 2006, NIPS.

[4]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[5]  R. Coifman,et al.  Geometric harmonics: A novel tool for multiscale out-of-sample extension of empirical functions , 2006 .

[6]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[7]  Hongyuan Zha,et al.  Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment , 2002, ArXiv.

[8]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[9]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[10]  Ronald R. Coifman,et al.  Multiscale data sampling and function extension , 2013 .

[11]  Ronald R. Coifman,et al.  Data Fusion and Multicue Data Matching by Diffusion Maps , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[13]  Alon Schclar,et al.  A Diffusion Framework for Dimensionality Reduction , 2008, Soft Computing for Knowledge Discovery and Data Mining.

[14]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[15]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[17]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[18]  Minh N. Do,et al.  Framing pyramids , 2003, IEEE Trans. Signal Process..

[19]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.