Generalized Independent Subspace Clustering

Data can encapsulate different object groupings in subspaces of arbitrary dimension and orientation. Finding such subspaces and the groupings within them is the goal of generalized subspace clustering. In this work we present a generalized subspace clustering technique capable of finding multiple non-redundant clusterings in arbitrarily-oriented subspaces. We use Independent Subspace Analysis (ISA) to find the subspace collection that minimizes the statistical dependency (redundancy) between clusterings. We then cluster in the arbitrarily-oriented subspaces identified by ISA. Our algorithm ISAAC (Independent Subspace Analysis and Clustering) uses the Minimum Description Length principle to automatically choose parameters that are otherwise difficult to set. We comprehensively demonstrate the effectiveness of our approach on synthetic and real-world data.

[1]  Ira Assent,et al.  Relevant Subspace Clustering: Mining the Most Interesting Non-redundant Concepts in High Dimensional Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[2]  Thomas Seidl,et al.  SMVC: semi-supervised multi-view clustering in subspace projections , 2014, KDD.

[3]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[4]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[5]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[6]  Michael I. Jordan,et al.  Multiple Non-Redundant Spectral Clustering Views , 2010, ICML.

[7]  James Bailey,et al.  COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity , 2006, Sixth International Conference on Data Mining (ICDM'06).

[8]  James Bailey,et al.  A hierarchical information theoretic technique for the discovery of non linear alternative clusterings , 2010, KDD.

[9]  Jorma Rissanen,et al.  Information and Complexity in Statistical Modeling , 2006, ITW.

[10]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[11]  Xuan Vinh Nguyen,et al.  minCEntropy: A Novel Information Theoretic Approach for the Generation of Alternative Clusterings , 2010, 2010 IEEE International Conference on Data Mining.

[12]  María Vanrell,et al.  Texton theory revisited: A bag-of-words approach to combine textons , 2012, Pattern Recognit..

[13]  Jian Pei,et al.  Finding multiple stable clusterings , 2016, Knowledge and Information Systems.

[14]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[15]  Andrew W. Moore,et al.  Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[16]  Bernhard Liebl,et al.  Very high compliance in an expanded MS-MS-based newborn screening program despite written parental consent. , 2002, Preventive medicine.

[17]  Thomas Seidl,et al.  Multi-view clustering using mixture models in subspace projections , 2012, KDD.

[18]  Barnabás Póczos,et al.  Separation theorem for independent subspace analysis and its consequences , 2012, Pattern Recognit..

[19]  Elke Achtert,et al.  Evaluation of Clusterings -- Metrics and Visual Support , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[20]  Ira Assent,et al.  INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Ying Cui,et al.  Non-redundant Multi-view Clustering via Orthogonalization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[22]  Jörg Sander,et al.  Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering , 2008, KDD.

[23]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..