Random spatial subspace clustering

Strong spatial or time correlation exists in many types of data, for example, the hyperspectral data acquired by a spectrometer scanning through rock samples from a drill hole. It is of practical interests to identify spatially continuous segments in a given data set where we know a priori that the samples are strongly correlated spatially. Recently, a novel method called spatial subspace clustering (SpatSC) was proposed to address this problem. However, due to the subspace learning nature of the SpatSC model, this method becomes intractable when the number of samples to be processed is very large. To alleviate computational intensity, we proposed a method called random spatial subspace clustering or RSSC for short. In RSSC, only a subset of data is segmented by SpatSC and an overall solution is obtained through propagation. This reduces the computational cost significantly. Yet a very important question to answer is to what extent the RSSC solution differs from that of SpatSC. In this paper, we analyse the propagation procedure and derive an average error rate of RSSC solution compared to SpatSC solution on the whole data set. The results show that the RSSC clustering result is close to SpatSC result under mild conditions. This provides a theoretic performance guarantee of RSSC. Our analysis also reveals the guided random sampling implemented by crude spatial clustering is crucial in improving RSSC results. We evaluate RSSC quantitatively on various data sets to assess its effectiveness under different settings. The results show that RSSC has similar performance to SpatSC as indicated by the theory while its computational cost is only a fraction of that of SpatSC.

[1]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[2]  Rolf Adams,et al.  Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  R. Askey Orthogonal Polynomials and Special Functions , 1975 .

[4]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[5]  Michael Elad,et al.  Sparse and Redundant Modeling of Image Content Using an Image-Signature-Dictionary , 2008, SIAM J. Imaging Sci..

[6]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[9]  Arkadi Nemirovski,et al.  EFFICIENT METHODS IN CONVEX PROGRAMMING , 2007 .

[10]  Feng Li,et al.  Large scale hyperspectral data segmentation by random spatial subspace clustering , 2013, 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.

[11]  René Vidal,et al.  Clustering disjoint subspaces via sparse representation , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[13]  Stuart J. Mills,et al.  In Situ Diffraction Studies: Thermal Decomposition of a Natural Plumbojarosite and the Development of Rietveld-Based Data Analysis Techniques , 2010 .

[14]  Charlie Chen,et al.  Digitally mapping the information content of visible–near infrared spectra of surficial Australian soils , 2011 .

[15]  Junbin Gao,et al.  Twin Kernel Embedding , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Federico Rotolo,et al.  parfm: Parametric Frailty Models in R , 2012 .

[17]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[18]  Junbin Gao,et al.  Dimensionality reduction via compressive sensing , 2012, Pattern Recognit. Lett..

[19]  Yong Yu,et al.  Robust Recovery of Subspace Structures by Low-Rank Representation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Manuel Febrero-Bande,et al.  Statistical Computing in Functional Data Analysis: The R Package fda.usc , 2012 .

[21]  Mark Berman,et al.  A comparison between subset selection and L1 regularisation with an application in spectroscopy , 2012 .

[22]  Zhixun Su,et al.  Solving Principal Component Pursuit in Linear Time via $l_1$ Filtering , 2011, ArXiv.

[23]  Fred A. Kruse,et al.  The Spectral Image Processing System (SIPS) - Interactive visualization and analysis of imaging spectrometer data , 1993 .

[24]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[25]  René Vidal,et al.  Sparse Manifold Clustering and Embedding , 2011, NIPS.

[26]  Mark Berman,et al.  An Unmixing Algorithm Based on a Large Library of Shortwave Infrared Spectra , 2011 .

[27]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Yong Yu,et al.  Robust Subspace Segmentation by Low-Rank Representation , 2010, ICML.

[29]  I. Jolliffe Principal Component Analysis , 2002 .

[30]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[31]  Xiaoyang Tan,et al.  Pattern Recognition , 2016, Communications in Computer and Information Science.