Clustering with Instance and Attribute Level Side Information

Selecting a suitable proximity measure is one of the fundamental tasks in clustering. How to effectively utilize all available side information, including the instance level information in the form of pair-wise constraints, and the attribute level information in the form of attribute order preferences, is an essential problem in metric learning. In this paper, we propose a learning framework in which both the pair-wise constraints and the attribute order preferences can be incorporated simultaneously. The theory behind it and the related parameter adjusting technique have been described in details. Experimental results on benchmark data sets demonstrate the effectiveness of proposed method.

[1]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[2]  Ian Davidson,et al.  Measuring Constraint-Set Utility for Partitional Clustering Algorithms , 2006, PKDD.

[3]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[4]  Xiaohua Hu,et al.  Exploiting Wikipedia as external knowledge for document clustering , 2009, KDD.

[5]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[6]  Jun Sun,et al.  Clustering with feature order preferences , 2010, Intell. Data Anal..

[7]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[9]  Xiaoyuan Wu,et al.  Keyword extraction for contextual advertisement , 2008, WWW.

[10]  Xiaojin Zhu,et al.  Kernel Regression with Order Preferences , 2007, AAAI.

[11]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[12]  Jing Hua,et al.  Incorporating User Provided Constraints into Document Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[13]  Yinyu Ye,et al.  On a homogeneous algorithm for the monotone complementarity problem , 1999, Math. Program..

[14]  Hans-Peter Kriegel,et al.  Collaborative ordinal regression , 2006, ICML.

[15]  Roelof K. Brouwer Fuzzy Relational Fixed Point Clustering , 2009 .

[16]  Krishna Kummamuru,et al.  Semisupervised Clustering with Metric Learning using Relative Comparisons , 2008, IEEE Trans. Knowl. Data Eng..

[17]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[18]  David M. W. Powers,et al.  Characterization and evaluation of similarity measures for pairs of clusterings , 2009, Knowledge and Information Systems.

[19]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[20]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[21]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[22]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[23]  Chen Song-Can,et al.  Discriminative Semi-Supervised Clustering Analysis with Pairwise Constraints , 2008 .

[24]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[25]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[26]  Jinlong Wang,et al.  Text document clustering with metric learning , 2010, SIGIR '10.

[27]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..

[28]  Dimitrios Gunopulos,et al.  A clustering framework based on subjective and objective validity criteria , 2008, TKDD.

[29]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[30]  Daoqiang Zhang,et al.  Semi-supervised clustering with metric learning: An adaptive kernel method , 2010, Pattern Recognit..

[31]  Xiang Ji,et al.  Document clustering with prior knowledge , 2006, SIGIR.

[32]  Xiaohua Hu,et al.  Towards effective document clustering: A constrained K-means based approach , 2008, Inf. Process. Manag..

[33]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[34]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[35]  Ian H. Witten,et al.  Clustering Documents with Active Learning Using Wikipedia , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[36]  Esref Adali,et al.  Improved Fuzzy Art Method for Initializing K-means , 2010, Int. J. Comput. Intell. Syst..

[37]  Roelof K. Brouwer Clustering feature vectors with mixed numerical and categorical attributes , 2008, Int. J. Comput. Intell. Syst..

[38]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[39]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[40]  Nizar Grira,et al.  Unsupervised and Semi-supervised Clustering : a Brief Survey ∗ , 2004 .

[41]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[42]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[43]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.