Subspace correlation clustering: finding locally correlated dimensions in subspace projections of the data

The necessity to analyze subspace projections of complex data is a well-known fact in the clustering community. While the full space may be obfuscated by overlapping patterns and irrelevant dimensions, only certain subspaces are able to reveal the clustering structure. Subspace clustering discards irrelevant dimensions and allows objects to belong to multiple, overlapping clusters due to individual subspace projections for each set of objects. As we will demonstrate, the observations, which originate the need to consider subspace projections for traditional clustering, also apply for the task of correlation analysis. In this work, we introduce the novel paradigm of subspace correlation clustering: we analyze subspace projections to find subsets of objects showing linear correlations among this subset of dimensions. In contrast to existing techniques, which determine correlations based on the full-space, our method is able to exclude locally irrelevant dimensions, enabling more precise detection of the correlated features. Since we analyze subspace projections, each object can contribute to several correlations. Our model allows multiple overlapping clusters in general but simultaneously avoids redundant clusters deducible from already known correlations. We introduce the algorithm SSCC that exploits different pruning techniques to efficiently generate a subspace correlation clustering. In thorough experiments we demonstrate the strength of our novel paradigm in comparison to existing methods.

[1]  Philip S. Yu,et al.  /spl delta/-clusters: capturing subspace correlation in a large data set , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Emmanuel Müller,et al.  Detection of orthogonal concepts in subspaces of high dimensional data , 2009, CIKM.

[3]  Elke Achtert,et al.  Robust, Complete, and Efficient Correlation Clustering , 2007, SDM.

[4]  Michael I. Jordan,et al.  Multiple Non-Redundant Spectral Clustering Views , 2010, ICML.

[5]  Xiang Zhang,et al.  REDUS: finding reducible subspaces in high dimensional data , 2008, CIKM '08.

[6]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[7]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[8]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[9]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[10]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[11]  Christian Böhm,et al.  Computing Clusters of Correlation Connected objects , 2004, SIGMOD '04.

[12]  A. Zimek,et al.  Deriving quantitative models for correlation clusters , 2006, KDD '06.

[13]  Chandan K. Reddy,et al.  A Robust Seedless Algorithm for Correlation Clustering , 2010, PAKDD.

[14]  Philip S. Yu,et al.  Finding generalized projected clusters in high dimensional spaces , 2000, SIGMOD '00.

[15]  Elke Achtert,et al.  On Exploring Complex Relationships of Correlation Clusters , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[16]  Xiang Zhang,et al.  CARE: Finding Local Linear Correlations in High Dimensional Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Marina Meila,et al.  Comparing subspace clusterings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[18]  Robert M. Haralick,et al.  Mining Subspace Correlations , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[19]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[20]  Ira Assent,et al.  INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Henry W. Altland,et al.  Regression Analysis: Statistical Modeling of a Response Variable , 1998, Technometrics.

[22]  Elke Achtert,et al.  Global Correlation Clustering Based on the Hough Transform , 2008, Stat. Anal. Data Min..