Classification from Pairwise Similarity and Unlabeled Data

Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification. We show that an unbiased estimator of the classification risk can be obtained only from SU data, and the estimation error of its empirical risk minimizer achieves the optimal parametric convergence rate. Finally, we demonstrate the effectiveness of the proposed method through experiments.

[1]  Zhengdong Lu Semi-supervised Clustering with Pairwise Constraints: A Discriminative Approach , 2007, AISTATS.

[2]  Gang Niu,et al.  Convex Formulation for Learning from Positive and Unlabeled Data , 2015, ICML.

[3]  Masashi Sugiyama,et al.  On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution , 2011, ICML.

[4]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[5]  Shahar Mendelson,et al.  Lower Bounds for the Empirical Minimization Algorithm , 2008, IEEE Transactions on Information Theory.

[6]  R. Fisher Social Desirability Bias and the Validity of Indirect Questioning , 1993 .

[7]  Zhenguo Li,et al.  Constrained clustering by spectral kernel learning , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Gang Niu,et al.  Class-prior estimation for learning from positive and unlabeled data , 2016, Machine Learning.

[9]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[10]  Gang Niu,et al.  Semi-supervised information-maximization clustering , 2014, Neural Networks.

[11]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Zhenguo Li,et al.  Pairwise constraint propagation by semidefinite programming for semi-supervised classification , 2008, ICML '08.

[14]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[15]  Gang Niu,et al.  Analysis of Learning from Positive and Unlabeled Data , 2014, NIPS.

[16]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[17]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[18]  Gang Niu,et al.  Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data , 2016, ICML.

[19]  Nagarajan Natarajan,et al.  Prediction and clustering in signed networks: a local to global perspective , 2013, J. Mach. Learn. Res..

[20]  Inderjit S. Dhillon,et al.  Matrix Completion with Noisy Side Information , 2015, NIPS.

[21]  Bo Zhang,et al.  Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[23]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[24]  Ivan Laptev,et al.  Learning from Video and Text via Large-Scale Discriminative Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, Machine Learning.

[26]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[27]  Nagarajan Natarajan,et al.  Learning with Noisy Labels , 2013, NIPS.

[28]  Jinfeng Yi,et al.  Semi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion , 2013, ICML.

[29]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[30]  Gang Niu,et al.  Positive-Unlabeled Learning with Non-Negative Risk Estimator , 2017, NIPS.

[31]  Nuno Vasconcelos,et al.  Multiple instance learning for soft bags via top instances , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[33]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[34]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[35]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[36]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[37]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[38]  Zhi-Hua Zhou,et al.  A brief introduction to weakly supervised learning , 2018 .

[39]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[40]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[41]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[42]  Tomoya Sakai,et al.  Convex formulation of multiple instance learning from positive and unlabeled bags , 2017, Neural Networks.

[43]  Gang Niu,et al.  Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning , 2016, NIPS.

[44]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[45]  Gang Niu,et al.  Information-Theoretic Semi-Supervised Metric Learning via Entropy Regularization , 2012, Neural Computation.

[46]  Ambuj Tewari,et al.  Mixture Proportion Estimation via Kernel Embeddings of Distributions , 2016, ICML.

[47]  Clayton Scott,et al.  A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels , 2015, AISTATS.

[48]  Yudong Chen,et al.  Clustering Partially Observed Graphs via Convex Optimization , 2011, ICML.

[49]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[50]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .