Active Learning from Relative Comparisons

This work focuses on active learning from relative comparison information. A relative comparison specifies, for a data triplet (xi, xj, xk), that instance xi is more similar to xj than to xk. Such constraints, when available, have been shown to be useful toward learning tasks such as defining appropriate distance metrics or finding good clustering solutions. In real-world applications, acquiring constraints often involves considerable human effort, as it requires the user to manually inspect the instances. This motivates us to study how to select and query the most useful relative comparisons to achieve effective learning with minimum user effort. Given an underlying class concept that is employed by the user to provide such constraints, we present an information-theoretic criterion that selects the triplet whose answer leads to the highest expected information gain about the classes of a set of examples. Directly applying the proposed criterion requires examining O(n3) triplets with n instances, which is prohibitive even for datasets of moderate size. We show that a randomized selection strategy can be used to reduce the selection pool from O(n3) to O(n) with minimal loss in efficiency, allowing us to scale up to considerably larger problems. Experiments show that the proposed method consistently outperforms baseline policies.

[1]  Rong Jin,et al.  Active query selection for semi-supervised clustering , 2008, 2008 19th International Conference on Pattern Recognition.

[2]  Qing He,et al.  Effective semi-supervised document clustering via active learning with instance-level constraints , 2011, Knowledge and Information Systems.

[3]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[4]  Wei Wang,et al.  Clustering with relative constraints , 2011, KDD.

[5]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[6]  Rynson W. H. Lau,et al.  Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering , 2008 .

[7]  Kilian Q. Weinberger,et al.  Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[8]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[9]  Charu C. Aggarwal,et al.  Factorized Similarity Learning in Networks , 2014, 2014 IEEE International Conference on Data Mining.

[10]  Derek Greene,et al.  Constraint Selection by Committee: An Ensemble Approach to Identifying Informative Constraints for Semi-supervised Clustering , 2007, ECML.

[11]  David W. Jacobs,et al.  Active Image Clustering with Pairwise Constraints from Humans , 2014, International Journal of Computer Vision.

[12]  Andreas Nürnberger,et al.  Hierarchical constraints , 2013, Machine Learning.

[13]  Xiang Zhang,et al.  Metric Learning from Relative Comparisons by Minimizing Squared Residual , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Rong Jin,et al.  Bayesian Active Distance Metric Learning , 2007, UAI.

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Xiaoli Z. Fern,et al.  Rank-loss support instance machines for MIML instance annotation , 2012, KDD.

[17]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[18]  Xiaoli Z. Fern,et al.  Active Learning of Constraints for Semi-Supervised Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[20]  Wai Lam,et al.  Semi-supervised Document Clustering via Active Learning with Pairwise Constraints , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Marie desJardins,et al.  Active Constrained Clustering by Examining Spectral Eigenvectors , 2005, Discovery Science.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Deepa Paranjpe,et al.  Semi-supervised clustering with metric learning using relative comparisons , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[25]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[26]  Ian Davidson,et al.  Active Spectral Clustering , 2010, 2010 IEEE International Conference on Data Mining.

[27]  Glenn Fung,et al.  Learning sparse metrics via linear programming , 2006, KDD '06.

[28]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.