Nuclear Discrepancy for Active Learning

Active learning algorithms propose which unlabeled objects should be queried for their labels to improve a predictive model the most. We study active learners that minimize generalization bounds and uncover relationships between these bounds that lead to an improved approach to active learning. In particular we show the relation between the bound of the state-of-the-art Maximum Mean Discrepancy (MMD) active learner, the bound of the Discrepancy, and a new and looser bound that we refer to as the Nuclear Discrepancy bound. We motivate this bound by a probabilistic argument: we show it considers situations which are more likely to occur. Our experiments indicate that active learning using the tightest Discrepancy bound performs the worst in terms of the squared loss. Overall, our proposed loosest Nuclear Discrepancy generalization bound performs the best. We confirm our probabilistic argument empirically: the other bounds focus on more pessimistic scenarios that are rarer in practice. We conclude that tightness of bounds is not always of main importance and that active learning methods should concentrate on realistic scenarios in order to improve performance.

[1]  Alexander G. Gray,et al.  UPAL: Unbiased Pool Based Active Learning , 2011, AISTATS.

[2]  Chris H. Q. Ding,et al.  Selective Labeling via Error Bound Minimization , 2012, NIPS.

[3]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[4]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[5]  Jiawei Han,et al.  Batch-Mode Active Learning via Error Bound Minimization , 2014, UAI.

[6]  Jieping Ye,et al.  Querying discriminative and representative samples for batch mode active learning , 2013, KDD.

[7]  Sethuraman Panchanathan,et al.  Batch Mode Active Sampling Based on Marginal Probability Distribution Matching , 2013, ACM Trans. Knowl. Discov. Data.

[8]  Mehryar Mohri,et al.  Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[9]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[10]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[12]  Jiawei Han,et al.  Towards Active Learning on Graphs: An Error Bound Minimization Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[13]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[14]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[15]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[17]  T. Poggio,et al.  Regularized Least-Squares Classification 133 In practice , although , 2007 .

[18]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.