Less is more: Selecting informative and diverse subsets with balancing constraints

Deep learning has yielded extraordinary results in vision and natural language processing, but this achievement comes at a cost. Most models require enormous resources during training, both in terms of computation and in human labeling effort. We show that we can identify informative and diverse subsets of data that lead to deep learning models with similar performance as the ones trained with the original dataset. Prior methods have exploited diversity and uncertainty in submodular objective functions for choosing subsets. In addition to these measures, we show that balancing constraints on predicted class labels and decision boundaries are beneficial. We propose a novel formulation of these constraints using matroids, an algebraic structure that generalizes linear independence in vector spaces, and present an efficient greedy algorithm with constant approximation guarantees. We outperform competing baselines on standard classification datasets such as CIFAR-10, CIFAR-100, ImageNet, as well as long-tailed datasets such as CIFAR-100-LT.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Pushmeet Kohli,et al.  P3 & Beyond: Solving Energies with Higher Order Cliques , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Abhimanyu Das,et al.  Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection , 2018, J. Mach. Learn. Res..

[4]  Roberto Cipolla,et al.  Label propagation in video sequences , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[6]  Vineeth N Balasubramanian,et al.  Submodular Batch Selection for Training Deep Neural Networks , 2019, IJCAI.

[7]  Carla E. Brodley,et al.  Class Imbalance, Redux , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Philip H. S. Torr,et al.  Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions , 2011, Discret. Appl. Math..

[10]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Sanjiv Kumar,et al.  Accelerating Large-Scale Inference with Anisotropic Vector Quantization , 2019, ICML.

[12]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[13]  Ankit Singh Rawat,et al.  Long-tail learning via logit adjustment , 2020, ICLR.

[14]  Stefan Wrobel,et al.  Active Hidden Markov Models for Information Extraction , 2001, IDA.

[15]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[16]  Gert W. Wolf,et al.  Facility location: concepts, models, algorithms and case studies. Series: Contributions to Management Science , 2011, Int. J. Geogr. Inf. Sci..

[17]  Naftali Tishby,et al.  Query by Committee Made Real , 2005, NIPS.

[18]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[19]  Yang Yu,et al.  Subset Selection under Noise , 2017, NIPS.

[20]  Baharan Mirzasoleiman,et al.  Selection Via Proxy: Efficient Data Selection For Deep Learning , 2019, ICLR.

[21]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[22]  Changjian Shui,et al.  Deep Active Learning: Unified and Principled Method for Query and Training , 2020, AISTATS.

[23]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[24]  James Zou,et al.  Data Shapley Valuation for Efficient Batch Active Learning , 2021, ArXiv.

[25]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Kwang In Kim,et al.  Task-Aware Variational Adversarial Active Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[29]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[30]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[32]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[33]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Jieping Ye,et al.  Querying discriminative and representative samples for batch mode active learning , 2013, KDD.

[36]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[37]  Hossein Mobahi,et al.  Large Margin Deep Networks for Classification , 2018, NeurIPS.

[38]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[39]  Stefanie Jegelka,et al.  Submodular meets Structured: Finding Diverse Subsets in Exponentially-Large Structured Item Sets , 2014, NIPS.

[40]  Suraj Kothawade,et al.  PRISM: A Unified Framework of Parameterized Submodular Information Measures for Targeted Data Subset Selection and Summarization , 2021, ArXiv.

[41]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.

[42]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[43]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[44]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[45]  Laming Chen,et al.  Fast Greedy MAP Inference for Determinantal Point Process to Improve Recommendation Diversity , 2017, NeurIPS.

[46]  Lars Svensson,et al.  Clifford algebra, geometric algebra, and applications , 2009, 0907.5356.

[47]  Costas J. Spanos,et al.  Causal meets Submodular: Subset Selection with Directed Information , 2016, NIPS.

[48]  Hossein Mobahi,et al.  Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need , 2019, ArXiv.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[51]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[53]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[54]  S. Sra,et al.  Optimal Batch Variance with Second-Order Marginals , 2020 .

[55]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[56]  Rohan Mahadev,et al.  Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[57]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[58]  Ehsan Elhamifar,et al.  Sequential Facility Location: Approximate Submodularity and Greedy Algorithm , 2019, ICML.

[59]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[60]  Suraj Kothawade,et al.  SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios , 2021, NeurIPS.

[61]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.