Neural Active Learning with Performance Guarantees

We investigate the problem of active learning in the streaming setting in nonparametric regimes, where the labels are stochastically generated from a class of functions on which we make no assumptions whatsoever. We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop. Since the shape of the label requesting threshold is tightly related to the complexity of the function to be learned, which is a-priori unknown, we also derive a version of the algorithm which is agnostic to any prior knowledge. This algorithm relies on a regret balancing scheme to solve the resulting online model selection problem, and is computationally efficient. We prove joint guarantees on the cumulative regret and number of requested labels which depend on the complexity of the labeling function at hand. In the linear case, these guarantees recover known minimax results of the generalization error as a function of the label complexity in a standard statistical learning setting.

[1]  Alexandra Carpentier,et al.  Adaptivity to Noise Parameters in Nonparametric Active Learning , 2017, COLT.

[2]  Claudio Gentile,et al.  Regret Bound Balancing and Elimination for Model Selection in Bandits and RL , 2020, ArXiv.

[3]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[4]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[5]  P. Alam ‘G’ , 2021, Composites Engineering: An A–Z Guide.

[6]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[7]  Julien Mairal,et al.  On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.

[8]  Remus Pop,et al.  Deep Ensemble Bayesian Active Learning : Addressing the Mode Collapse issue in Monte Carlo dropout via Ensembles , 2018, ArXiv.

[9]  Ruosong Wang,et al.  On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.

[10]  Maxim Raginsky,et al.  Lower Bounds for Passive and Active Learning , 2011, NIPS.

[11]  Ruth Urner,et al.  Active Nearest-Neighbor Learning in Metric Spaces , 2016, NIPS.

[12]  Fedor Zhdanov,et al.  Diverse mini-batch Active Learning , 2019, ArXiv.

[13]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[14]  Julian Zimmert,et al.  Model Selection in Contextual Stochastic Bandit Problems , 2020, NeurIPS.

[15]  Sanjoy Dasgupta,et al.  Rates of Convergence for Nearest Neighbor Classification , 2014, NIPS.

[16]  Quanquan Gu,et al.  Neural Thompson Sampling , 2020, ICLR.

[17]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[18]  Robert D. Nowak,et al.  MaxiMin Active Learning in Overparameterized Model Classes , 2019, IEEE Journal on Selected Areas in Information Theory.

[19]  Quanquan Gu,et al.  Neural Contextual Bandits with UCB-based Exploration , 2019, ICML.

[20]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[21]  Claudio Gentile,et al.  Selective sampling and active learning from single and multiple teachers , 2012, J. Mach. Learn. Res..

[22]  Yuandong Tian,et al.  Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.

[23]  Stanislav Minsker,et al.  Plug-in Approach to Active Learning , 2011, J. Mach. Learn. Res..

[24]  Aditya Gopalan,et al.  On Kernelized Multi-armed Bandits , 2017, ICML.

[25]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[26]  Maria-Florina Balcan,et al.  The true sample complexity of active learning , 2010, Machine Learning.

[27]  Vladimir Koltchinskii,et al.  Rademacher Complexities and Bounding the Excess Risk in Active Learning , 2010, J. Mach. Learn. Res..

[28]  Xavier Siebert,et al.  Nonparametric adaptive active learning under local smoothness condition , 2021, ArXiv.

[29]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[30]  Yuan Cao,et al.  Towards Understanding the Spectral Bias of Deep Learning , 2021, IJCAI.

[31]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[32]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[33]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[36]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[37]  Rocco A. Servedio,et al.  On the Approximation Power of Two-Layer Networks of Random ReLUs , 2021, COLT.

[38]  Jasjeet S. Sekhon,et al.  Time-uniform, nonparametric, nonasymptotic confidence sequences , 2020, The Annals of Statistics.

[39]  Yuan Cao,et al.  Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.

[40]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[41]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[42]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[43]  Steve Hanneke,et al.  Adaptive Rates of Convergence in Active Learning , 2009, COLT.

[44]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[45]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[46]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.