Projected memory clustering

Abstract We present a new algorithm PMC (Projected Memory Clustering) for projected clustering of high dimensional data. It effectively discovers clusters described by affine subspaces parallel to the main axes of coordinate system. The number of clusters and dimensions of subspaces are selected automatically. Experiments performed on various types of data show that PMC detects clustering structures better than related projected clustering methods. Moreover, it is fast, which makes it suitable for practical use.

[1]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[3]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[4]  Allen Y. Yang,et al.  Robust Statistical Estimation and Segmentation of Multiple Subspaces , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[5]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Paul S. Bradley,et al.  k-Plane Clustering , 2000, J. Glob. Optim..

[7]  Jacek Tabor,et al.  Lossy compression approach to subspace clustering , 2018, Inf. Sci..

[8]  Kathryn B. Laskey,et al.  Latent Dirichlet Bayesian Co-Clustering , 2009, ECML/PKDD.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  I. Jolliffe Principal Component Analysis , 2002 .

[11]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[12]  Gianluca Bontempi,et al.  New Routes from Minimal Approximation Error to Principal Components , 2008, Neural Processing Letters.

[13]  Arthur Zimek,et al.  A survey on enhanced subspace clustering , 2013, Data Mining and Knowledge Discovery.

[14]  George Atia,et al.  Innovation Pursuit: A New Approach to the Subspace Clustering Problem , 2017, ICML.

[15]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Andrzej J. Bojarski,et al.  A Linear Combination of Pharmacophore Hypotheses as a New Tool in Search of New Active Compounds – An Application for 5-HT1A Receptor Ligands , 2013, PloS one.

[17]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[18]  Andrea Tagarelli,et al.  Metacluster-based Projective Clustering Ensembles , 2013, Machine Learning.

[19]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[20]  Friedhelm Schwenker,et al.  Three learning phases for radial-basis-function networks , 2001, Neural Networks.

[21]  Przemyslaw Spurek,et al.  Subspaces Clustering Approach to Lossy Image Compression , 2014, CISIM.

[22]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[23]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[24]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[25]  Kathryn B. Laskey,et al.  Nonparametric Bayesian Co-clustering Ensembles , 2011, SDM.

[26]  J Tabor,et al.  Cross-entropy clustering , 2012, Pattern Recognit..

[27]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[28]  Michael K. Ng,et al.  On discovery of extremely low-dimensional clusters using semi-supervised projected clustering , 2005, 21st International Conference on Data Engineering (ICDE'05).

[29]  Daniel P. Robinson,et al.  Oracle Based Active Set Algorithm for Scalable Elastic Net Subspace Clustering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Shuicheng Yan,et al.  Robust and Efficient Subspace Segmentation via Least Squares Regression , 2012, ECCV.

[31]  Zhongmin Liang,et al.  Regional comprehensive drought disaster risk dynamic evaluation based on projection pursuit clustering , 2018 .

[32]  Christian Böhm,et al.  Density connected clustering with local subspace preferences , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[33]  René Vidal,et al.  Hyperplane Clustering via Dual Principal Component Pursuit , 2017, ICML.

[34]  Jörg Sander,et al.  Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering , 2008, KDD.