Active Constrained Clustering via non-iterative uncertainty sampling

Active Constraint Learning (ACL) is continuously gaining popularity in the area of constrained clustering due to its ability to achieve performance gains via incorporating minimal feedback from a human annotator for selected instances. For constrained clustering algorithms, such instances are integrated in the form of Must-Link (ML) and Cannot-Link (CL) constraints. Existing iterative uncertainty reduction schemes, introduce high computational burden particularly when they process larger datasets that are usually present in computer vision and visual learning applications. For scenarios that multiple agents (i.e., robots) require user feedback for performing recognition tasks, minimizing the interaction between the user and the agents, without compromising performance, is an essential task. In this study, a non-iterative ACL scheme with proven performance benefits is presented. We select to demonstrate the effectiveness of our methodology by building on the well known K-Means algorithm for clustering; one can easily extend it to alternative clustering schemes. The proposed methodology introduces the use of the Silhouette values, conventionally used for measuring clustering performance, in order to rank the degree of information content of the various samples. In addition, an efficient greedy selection scheme was devised for selecting the most informative samples for human annotation. To the best of our knowledge, this is the first active constrained clustering methodology with the ability to process computer vision datasets that this study targets. Performance results are shown on various computer vision benchmarks and support the merits of adopting the proposed scheme.

[1]  Xiaoli Z. Fern,et al.  Active Learning of Constraints for Semi-Supervised Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Fredrik Olsson,et al.  A literature survey of active machine learning in the context of natural language processing , 2009 .

[5]  Nebojsa Jojic,et al.  Active spectral clustering via iterative uncertainty reduction , 2012, KDD.

[6]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[7]  Derek Greene,et al.  Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering , 2007, ECML.

[8]  David W. Jacobs,et al.  Active image clustering: Seeking constraints from humans to complement algorithms , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[10]  Zhenguo Li,et al.  Constrained clustering via spectral regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[12]  Ian Davidson,et al.  Measuring Constraint-Set Utility for Partitional Clustering Algorithms , 2006, PKDD.

[13]  Ohad Shamir,et al.  Spectral Clustering on a Budget , 2011, AISTATS.

[14]  Wai Lam,et al.  Semi-supervised Document Clustering via Active Learning with Pairwise Constraints , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  Ian Davidson,et al.  Active Spectral Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Ian Davidson,et al.  When Is Constrained Clustering Beneficial, and Why? , 2006, AAAI.

[17]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[18]  Peter Meer,et al.  Semi-Supervised Kernel Mean Shift Clustering , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Marie desJardins,et al.  Active Constrained Clustering by Examining Spectral Eigenvectors , 2005, Discovery Science.