Oriented k-windows: A PCA driven clustering method

In this paper we present the application of Principal Component Analysis (PCA) on subsets of the dataset to better approximate clusters. We focus on a specific density-based clustering algorithm, k-Windows, that holds particular promise for problems of moderate dimensionality. We show that the resulting algorithm, we call Oriented k-Windows (OkW), is able to steer the clustering procedure by effectively capturing several coexisting clusters of different orientation. OkW combines techniques from computational geometry and numerical linear algebra and appears to be particularly effective when applied on difficult datasets of moderate dimensionality.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[3]  Athanasios K. Tsakalidis,et al.  A computational geometry approach to Web personalization , 2004, Proceedings. IEEE International Conference on e-Commerce Technology, 2004. CEC 2004..

[4]  Sharad Mehrotra,et al.  Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces , 2000, VLDB.

[5]  Dinesh Manocha,et al.  OBBTree: a hierarchical structure for rapid interference detection , 1996, SIGGRAPH.

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Franco P. Preparata,et al.  Computational Geometry , 1985, Texts and Monographs in Computer Science.

[9]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Olaf Stursberg,et al.  Efficient Representation and Computation of Reachable Sets for Hybrid Systems , 2003, HSCC.

[12]  Gene H. Golub,et al.  Matrix computations , 1983 .

[13]  Joseph S. B. Mitchell,et al.  Approximate minimum enclosing balls in high dimensions using core-sets , 2003, ACM J. Exp. Algorithmics.

[14]  Michael N. Vrahatis,et al.  The New k-Windows Algorithm for Improving the k-Means Clustering Algorithm , 2002, J. Complex..

[15]  Christian Böhm,et al.  Computing Clusters of Correlation Connected objects , 2004, SIGMOD '04.

[16]  J. Mcnames Rotated partial distance search for faster vector quantization encoding , 2000, IEEE Signal Processing Letters.

[17]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[18]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[19]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[20]  Dimitrios Gunopulos,et al.  Subspace Clustering of High Dimensional Data , 2004, SDM.

[21]  Leonidas J. Guibas,et al.  BOXTREE: A Hierarchical Representation for Surfaces in 3D , 1996, Comput. Graph. Forum.

[22]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[23]  Dimitris K. Tasoulis,et al.  Parallelizing the Unsupervised k-Windows Clustering Algorithm , 2003, PPAM.