An Efficient Active Constraint Selection Algorithm for Clustering

In this paper, we address the problem of active query selection for clustering with constraints. The objective is to determine automatically a set of queries and their associated must-link and can-not link constraints to help constraints based clustering algorithms to converge. Some works on active constraints learning have already been proposed but they are only applied to K-Means like clustering algorithms which are known to be limited to spherical clusters while we are interested in constraints-based clustering algorithms that deals with clusters of arbitrary shapes and sizes (like Constrained-DBSCAN, Constrained-Hierarchical Clustering. . . ). Our novel approach relies on a k-nearest neighbors graph to estimate the dense regions of the data space and generates queries at the frontier between clusters where the cluster membership is most uncertain. Experiments show that our framework improves the performance of constraints based clustering algorithms.

[1]  Ian Davidson,et al.  Measuring Constraint-Set Utility for Partitional Clustering Algorithms , 2006, PKDD.

[2]  Bernadette Bouchon-Meunier,et al.  Leader Ant Clustering with Constraints , 2009, 2009 IEEE-RIVF International Conference on Computing and Communication Technologies.

[3]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[6]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[7]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[8]  Myra Spiliopoulou,et al.  C-DBSCAN: Density-Based Clustering with Constraints , 2009, RSFDGrC.

[9]  Marie desJardins,et al.  Active Constrained Clustering by Examining Spectral Eigenvectors , 2005, Discovery Science.

[10]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[11]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[12]  S. S. Ravi,et al.  Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results , 2005, PKDD.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.