GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training

Krishnateja Killamsetty1, Durga Sivasubramanian2, Baharan Mirzasoleiman3 Ganesh Ramakrishnan2, Abir De2, Rishabh Iyer1 Department of Computer Science 1The University of Texas at Dallas 2 Indian Institute of Technology, Bombay 3 University of California, Los Angeles 1 Richardson, Texas, USA 2 Mumbai, Maharashtra, India 3 Los Angeles, California, USA krishnateja.killamsetty@utdallas.edu,durgas@cse.iitb.ac.in,baharan@cs.ucla.edu ganesh@cse.iitb.ac.in,abir@cse.iitb.ac.in,rishabh.iyer@utdallas.edu

[1]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[2]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[3]  Gert W. Wolf,et al.  Facility location: concepts, models, algorithms and case studies. Series: Contributions to Management Science , 2011, Int. J. Geogr. Inf. Sci..

[4]  Laurence A. Wolsey,et al.  An analysis of the greedy algorithm for the submodular set covering problem , 1982, Comb..

[5]  Jeff A. Bilmes,et al.  Submodular subset selection for large-scale speech training data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  藤重 悟 Submodular functions and optimization , 1991 .

[7]  Baharan Mirzasoleiman,et al.  Selection Via Proxy: Efficient Data Selection For Deep Learning , 2019, ICLR.

[8]  Yoshua Bengio,et al.  An Empirical Study of Example Forgetting during Deep Neural Network Learning , 2018, ICLR.

[9]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[10]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[11]  Dan Feldman,et al.  Core‐sets: An updated survey , 2019, WIREs Data Mining Knowl. Discov..

[12]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[13]  Pratik Dubal,et al.  Demystifying Multi-Faceted Video Summarization: Tradeoff Between Diversity, Representation, Coverage and Importance , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[14]  Rishabh K. Iyer,et al.  SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech , 2015, INTERSPEECH.

[15]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[16]  Jeff A. Bilmes,et al.  Unsupervised submodular subset selection for speech data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[20]  Ganesh Ramakrishnan,et al.  GLISTER: Generalization based Data Subset Selection for Efficient and Robust Learning , 2021, AAAI.

[21]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[22]  Jeff A. Bilmes,et al.  Submodularity for Data Selection in Machine Translation , 2014, EMNLP.

[23]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[24]  Baharan Mirzasoleiman,et al.  Coresets for Data-efficient Training of Machine Learning Models , 2019, ICML.

[25]  Rishabh K. Iyer,et al.  Near Optimal Algorithms for Hard Submodular Programs with Discounted Cooperative Costs , 2019, AISTATS.

[26]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[27]  Alexandros G. Dimakis,et al.  Restricted Strong Convexity Implies Weak Submodularity , 2016, The Annals of Statistics.

[28]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[29]  Jeff A. Bilmes,et al.  Submodularity beyond submodular energies: Coupling edges in graph cuts , 2011, CVPR 2011.

[30]  Yuanzhou Yang,et al.  Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes , 2018, ArXiv.

[31]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[32]  Yoav Shoham,et al.  The Cost of Training NLP Models: A Concise Overview , 2020, ArXiv.

[33]  Baharan Mirzasoleiman,et al.  Coresets for Robust Training of Neural Networks against Noisy Labels , 2020, ArXiv.

[34]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[35]  Manuel Gomez-Rodriguez,et al.  Non-submodular Function Maximization subject to a Matroid Constraint, with Applications , 2018 .

[36]  Andreas Krause,et al.  Distributed Submodular Cover: Succinctly Summarizing Massive Data , 2015, NIPS.

[37]  Yue Wang,et al.  E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings , 2019, NeurIPS.

[38]  Rishabh K. Iyer,et al.  Fast Multi-stage Submodular Maximization , 2014, ICML.

[39]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[40]  Rohan Mahadev,et al.  Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[42]  Rishabh K. Iyer,et al.  Summarization of Multi-Document Topic Hierarchies using Submodular Mixtures , 2015, ACL.

[43]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[44]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[45]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .