Outlier Cluster Formation in Spectral Clustering

Outlier detection and cluster number estimation is an important issue for clustering real data. This paper focuses on spectral clustering, a time-tested clustering method, and reveals its important properties related to outliers. The highlights of this paper are the following two mathematical observations: first, spectral clustering's intrinsic property of an outlier cluster formation, and second, the singularity of an outlier cluster with a valid cluster number. Based on these observations, we designed a function that evaluates clustering and outlier detection results. In experiments, we prepared two scenarios, face clustering in photo album and person re-identification in a camera network. We confirmed that the proposed method detects outliers and estimates the number of clusters properly in both problems. Our method outperforms state-of-the-art methods in both the 128-dimensional sparse space for face clustering and the 4,096-dimensional non-sparse space for person re-identification.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Kazuo Iwama,et al.  Enumeration of isolated cliques and pseudo-cliques , 2009, TALG.

[3]  Fionn Murtagh,et al.  Handbook of Cluster Analysis , 2015 .

[4]  Zhenguo Li,et al.  Noise Robust Spectral Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Christian Komusiewicz,et al.  Isolation concepts for clique enumeration: Comparison and computational experiments , 2009, Theor. Comput. Sci..

[6]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[8]  Allen R. Hanson,et al.  Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations , 2013, 2013 IEEE International Conference on Computer Vision.

[9]  Fang Chen,et al.  Clustering High-Dimensional Data via Spectral Clustering Using Collaborative Representation Coefficients , 2015, ICIC.

[10]  Jianbo Shi,et al.  Multiclass spectral clustering , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[13]  Wenjun Zhou,et al.  Spectral clustering of high-dimensional data exploiting sparse representation vectors , 2014, Neurocomputing.

[14]  Zhengqin Li,et al.  Superpixel segmentation using Linear Spectral Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiaogang Wang,et al.  Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Shuicheng Yan,et al.  Fast Detection of Dense Subgraphs with Iterative Shrinking and Expansion , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Zhu Qing-sheng,et al.  A Spectral Clustering-Based Dataset Structure Analysis and OutlierDetection Progress , 2012 .

[19]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[22]  U. Brandes,et al.  Maximizing Modularity is hard , 2006, physics/0608255.

[23]  Ira Assent,et al.  Clustering high dimensional data , 2012 .

[24]  Ian Davidson,et al.  Active Spectral Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[25]  Kazuo Iwama,et al.  Linear-Time Enumeration of Isolated Cliques , 2005, ESA.

[26]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[27]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[28]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[30]  Masayuki Mukunoki,et al.  Shinpuhkan2014: A Multi-Camera Pedestrian Dataset for Tracking People across Multiple Cameras , 2014 .