An Automated Spectral Clustering for Multi-scale Data

Abstract Spectral clustering algorithms typically require a priori selection of input parameters such as the number of clusters, a scaling parameter for the affinity measure, or ranges of these values for parameter tuning. Despite efforts for automating the process of spectral clustering, the task of grouping data in multi-scale and higher dimensional spaces is yet to be explored. This study presents a spectral clustering heuristic algorithm that obviates the need for any input by estimating the parameters from the data itself. Specifically, it introduces the heuristic of iterative eigengap search with (1) global scaling and (2) local scaling. These approaches estimate the scaling parameter and implement iterative eigengap quantification along a search tree to reveal dissimilarities at different scales of a feature space and identify clusters. The performance of these approaches has been tested on various real-world datasets of power variation with multi-scale nature and gene expression. Our findings show that iterative eigengap search with a PCA-based global scaling scheme can discover different patterns with an accuracy of higher than 90% in most cases without asking for a priori input information.

[1]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Pierre Vandergheynst,et al.  Compressive Spectral Clustering , 2016, ICML.

[3]  Tony Jebara,et al.  Spectral Clustering and Embedding with Hidden Markov Models , 2007, ECML.

[4]  David A. Clausi,et al.  Enabling scalable spectral clustering for image segmentation , 2010, Pattern Recognit..

[5]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[6]  Piet de Jong,et al.  Time‐Series Analysis , 1995 .

[7]  Zhongfei Zhang,et al.  Context-Aware Hypergraph Construction for Robust Spectral Clustering , 2014, 1401.0764.

[8]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[9]  Burcin Becerik-Gerber,et al.  EMBED: A Dataset for Energy Monitoring through Building Electricity Disaggregation , 2018, e-Energy.

[10]  Fang Liu,et al.  Spectral Clustering Ensemble Applied to SAR Image Segmentation , 2008, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13]  Marina Meila,et al.  L 10 : Spectral Clustering , 2016 .

[14]  Marina Meila,et al.  A Comparison of Spectral Clustering Algorithms , 2003 .

[15]  Huan Li,et al.  Energy-Efficient Structuralized Clustering for Sensor-Based Cyber Physical Systems , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[16]  Tommi S. Jaakkola,et al.  Approximate Inference in Additive Factorial HMMs with Application to Energy Disaggregation , 2012, AISTATS.

[17]  Qingyun Dai,et al.  Local information-based fast approximate spectral clustering , 2014, Pattern Recognit. Lett..

[18]  Jue Wang,et al.  Self-configuring event detection in electricity monitoring for human-building interaction , 2019, Energy and Buildings.

[19]  Kadim Tasdemir,et al.  Vector quantization based approximate spectral clustering of large datasets , 2012, Pattern Recognit..

[20]  Wei Liu,et al.  Scalable Sequential Spectral Clustering , 2016, AAAI.

[21]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[22]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[23]  Michael I. Jordan,et al.  Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[24]  Christoph Schnörr,et al.  Spectral clustering of linear subspaces for motion segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[25]  Pietro Perona,et al.  A Factorization Approach to Grouping , 1998, ECCV.

[26]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[27]  Maoguo Gong,et al.  Spectral clustering with eigenvector selection based on entropy ranking , 2010, Neurocomputing.

[28]  James A. Casbon,et al.  Spectral clustering of protein sequences , 2006, Nucleic acids research.

[29]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[30]  Johan A. K. Suykens,et al.  Self-tuned kernel spectral clustering for large scale networks , 2013, 2013 IEEE International Conference on Big Data.

[31]  Mario Berges,et al.  Unsupervised disaggregation of appliances using aggregated consumption data , 2011 .

[32]  Lexing Ying,et al.  Robust and efficient multi-way spectral clustering , 2016, ArXiv.

[33]  Neil D. Lawrence,et al.  Automatic Determination of the Number of Clusters Using Spectral Algorithms , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[34]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[35]  Christian Böhm,et al.  FUSE: Full Spectral Clustering , 2016, KDD.

[36]  Chin-Teng Lin,et al.  A review of clustering techniques and developments , 2017, Neurocomputing.

[37]  H. Abdi,et al.  Principal component analysis , 2010 .

[38]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[40]  Yingjie Xia,et al.  Scalable Constrained Spectral Clustering , 2015, IEEE Transactions on Knowledge and Data Engineering.

[41]  N.D. Hatziargyriou,et al.  Two-Stage Pattern Recognition of Load Curves for Classification of Electricity Customers , 2007, IEEE Transactions on Power Systems.

[42]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[43]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Burcin Becerik-Gerber,et al.  An unsupervised hierarchical clustering based heuristic algorithm for facilitated training of electricity consumption disaggregation systems , 2014, Adv. Eng. Informatics.

[45]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[46]  M. Hosseini,et al.  A new eigenvector selection strategy applied to develop spectral clustering , 2016, Multidimensional Systems and Signal Processing.

[47]  Xin-Ye Li,et al.  Constructing affinity matrix in spectral clustering based on neighbor propagation , 2012, Neurocomputing.

[48]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[49]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Yair Weiss,et al.  Segmentation using eigenvectors: a unifying view , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[51]  Burcin Becerik-Gerber,et al.  Unsupervised Clustering of Residential Electricity Consumption Measurements for Facilitated User-Centric Non-Intrusive Load Monitoring , 2014 .

[52]  Jianbo Shi,et al.  Learning Segmentation by Random Walks , 2000, NIPS.

[53]  Zhenguo Li,et al.  Noise Robust Spectral Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[54]  Qiang Yang,et al.  Integrating hidden Markov models and spectral analysis for sensory time series clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[55]  Shaogang Gong,et al.  Spectral clustering with eigenvector selection , 2008, Pattern Recognit..

[56]  Anna V. Little,et al.  A Multiscale Spectral Method for Learning Number of Clusters , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).