Efficient Algorithms for Learning from Coarse Labels

For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set Z , we observe coarse labels corresponding to a partition of Z (or a mixture of partitions). Our main algorithmic result is that essentially any problem learnable from fine grained labels can also be learned efficiently when the coarse data are sufficiently informative. We obtain our result through a generic reduction for answering Statistical Queries (SQ) over fine grained labels given only coarse labels. The number of coarse labels required depends polynomially on the information distortion due to coarsening and the number of fine labels |Z|. We also investigate the case of (infinitely many) real valued labels focusing on a central problem in censored and truncated statistics: Gaussian mean estimation from coarse data. We provide an efficient algorithm when the sets in the partition are convex and establish that the problem is NP-hard even for very simple non-convex sets. 1

[1]  Yu Liu,et al.  CNN-RNN: a large-scale hierarchical image classification framework , 2018, Multimedia Tools and Applications.

[2]  Christian Gourieroux,et al.  Econometrics of Qualitative Dependent Variables , 2000 .

[3]  Arnab Bhattacharyya,et al.  Efficient Statistics for Sparse Graphical Models from Truncated Samples , 2020, AISTATS.

[4]  Christos Tzamos,et al.  Efficient Truncated Statistics with Unknown Truncation , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[5]  A. Owen Empirical Likelihood Ratio Confidence Regions , 1990 .

[6]  Chunhua Hu,et al.  Learning fine-grained estimation of physiological states from coarse-grained labels by distribution restoration , 2020, Scientific Reports.

[7]  Javed A. Aslam,et al.  General bounds on statistical query learning and PAC learning with noise via hypothesis boosting , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[8]  Vitaly Feldman,et al.  A General Characterization of the Statistical Query Complexity , 2016, COLT.

[9]  J. Tobin Estimation of Relationships for Limited Dependent Variables , 1958 .

[10]  Jerry Li,et al.  Robustly Learning a Gaussian: Getting Optimal Error, Efficiently , 2017, SODA.

[11]  Maria-Florina Balcan,et al.  Statistical Active Learning Algorithms for Noise Tolerance and Differential Privacy , 2013, Algorithmica.

[12]  Richard Breen,et al.  Regression Models: Censored, Sample Selected, or Truncated Data , 1996 .

[13]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[14]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[15]  M. S. Wolynetz A Remark on Statistical Algorithm as 139: Maximum Likelihood Estimation in a Linear Model from Confined and Censored Normal Data , 1981 .

[16]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  Yurii Nesterov,et al.  First-order methods of smooth convex optimization with inexact oracle , 2013, Mathematical Programming.

[18]  Daniel M. Kane,et al.  Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks , 2020, COLT.

[19]  Christos Tzamos,et al.  Efficient Statistics, in High Dimensions, from Truncated Samples , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[20]  Constantinos Daskalakis,et al.  Truncated Linear Regression in High Dimensions , 2020, NeurIPS.

[21]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[22]  Constantinos Daskalakis,et al.  A Theoretical and Practical Framework for Regression and Classification from Truncated Samples , 2020, AISTATS.

[23]  Yang Wang,et al.  Weakly Supervised Image Classification with Coarse and Fine Labels , 2017, 2017 14th Conference on Computer and Robot Vision (CRV).

[24]  Daniel M. Kane,et al.  Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals , 2020, NeurIPS.

[25]  C. B. Morgan Truncated and Censored Samples, Theory and Applications , 1993 .

[26]  Matthieu Guillaumin,et al.  From categories to subcategories: Large-scale image classification with partial class label refinement , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization , 2015, SODA.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  James M. Robins,et al.  Coarsening at Random: Characterizations, Conjectures, Counter-Examples , 1997 .

[30]  Kate Saenko,et al.  Fine-grained Angular Contrastive Learning with Coarse Labels , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ohad Shamir,et al.  Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..

[32]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[33]  Linwei Ye,et al.  Fine-Grained Image Classification with Coarse and Fine Labels on One-Shot Learning , 2020, 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[34]  Jerry Li,et al.  Being Robust (in High Dimensions) Can Be Practical , 2017, ICML.

[35]  Christos Tzamos,et al.  Efficient Parameter Estimation of Truncated Boolean Product Distributions , 2020, COLT.

[36]  Adam R. Klivans,et al.  Statistical-Query Lower Bounds via Functional Gradients , 2020, NeurIPS.

[37]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2004, STOC '04.

[38]  Yu Cheng,et al.  High-Dimensional Robust Mean Estimation via Gradient Descent , 2020, ICML.

[39]  Diana Marculescu,et al.  Understanding the Impact of Label Granularity on CNN-Based Image Classification , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[40]  John Wilmes,et al.  Gradient Descent for One-Hidden-Layer Neural Networks: Polynomial Convergence and SQ Lower Bounds , 2018, COLT.

[41]  Helmut Schneider Truncated and censored samples from normal populations , 1986 .

[42]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[43]  Jerry Li,et al.  How Hard Is Robust Mean Estimation? , 2019, COLT.

[44]  Nasser M. Nasrabadi,et al.  A Weakly Supervised Fine Label Classifier Enhanced by Coarse Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Javed A. Aslam,et al.  General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting , 1998, Inf. Comput..

[46]  Johan Håstad,et al.  Some optimal inapproximability results , 2001, JACM.

[47]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[48]  Christos Tzamos,et al.  Computationally and Statistically Efficient Truncated Regression , 2020, COLT.

[49]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[50]  Matthieu Cord,et al.  Grafit: Learning fine-grained image representations with coarse labels , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  M. S. Wolynetz,et al.  Maximum Likelihood Estimation in a Linear Model from Confined and Censored Normal Data , 1979 .

[52]  Christos Tzamos,et al.  A Statistical Taylor Theorem and Extrapolation of Truncated Densities , 2021, COLT.

[53]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[54]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[55]  Ilias Diakonikolas,et al.  Outlier-Robust Clustering of Gaussians and Other Non-Spherical Mixtures , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[56]  Ioannis Panageas,et al.  On the Analysis of EM for truncated mixtures of two Gaussians , 2019, ALT.

[57]  Adam R. Klivans,et al.  Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent , 2020, ICML.

[58]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[59]  DavidR . Thomas,et al.  Confidence Interval Estimation of Survival Probabilities for Censored Data , 1975 .

[60]  Alexandre d'Aspremont,et al.  Smooth Optimization with Approximate Gradient , 2005, SIAM J. Optim..

[61]  M. Wicks,et al.  Large Random Matrices , 2012 .

[62]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Xiaotong Shen,et al.  Empirical Likelihood , 2002 .

[64]  Gregory Valiant,et al.  Learning from untrusted data , 2016, STOC.

[65]  Santosh S. Vempala,et al.  University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[66]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[67]  Linwei Ye,et al.  Weakly labeled fine-grained classification with hierarchy relationship of fine and coarse labels , 2019, J. Vis. Commun. Image Represent..

[68]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[69]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[70]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[71]  Alexandros G. Dimakis,et al.  Learning Distributions Generated by One-Layer ReLU Networks , 2019, NeurIPS.

[72]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.