Adaptive Dropout Rates for Learning with Corrupted Features

Feature noising is an effective mechanism on reducing the risk of overfitting. To avoid an explosive searching space, existing work typically assumes that all features share a single noise level, which is often cross-validated. In this paper, we present a Bayesian feature noising model that flexibly allows for dimension-specific or group-specific noise levels, and we derive a learning algorithm that adaptively updates these noise levels. Our adaptive rule is simple and interpretable, by drawing a direct connection to the fitness of each individual feature or feature group. Empirical results on various datasets demonstrate the effectiveness on avoiding extensive tuning and sometimes improving the performance due to its flexibility.

[1]  J. Langford,et al.  FeatureBoost: A Meta-Learning Algorithm that Improves Model Robustness , 2000, ICML.

[2]  Alan C. Bovik,et al.  Handbook of Image and Video Processing (Communications, Networking and Multimedia) , 2005 .

[3]  Stephen Tyree,et al.  Learning with Marginalized Corrupted Features , 2013, ICML.

[4]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[5]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[6]  Christopher B. Burge,et al.  Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals , 2003, RECOMB '03.

[7]  Volker Roth,et al.  The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[8]  Dinh Phung,et al.  Journal of Machine Learning Research: Preface , 2014 .

[9]  Ning Chen,et al.  Gibbs max-margin topic models with data augmentation , 2013, J. Mach. Learn. Res..

[10]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[11]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[12]  Christopher B. Burge,et al.  Maximum Entropy Modeling of Short Sequence Motifs with Applications to RNA Splicing Signals , 2004, J. Comput. Biol..

[13]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[14]  Brendan J. Frey,et al.  Adaptive dropout for training deep neural networks , 2013, NIPS.

[15]  Seunghak Lee,et al.  Adaptive Multi-Task Lasso: with Application to eQTL Detection , 2010, NIPS.

[16]  Benjamin Pfaff,et al.  Handbook Of Image And Video Processing , 2016 .

[17]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[18]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[19]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[20]  Peter Kulchyski and , 2015 .

[21]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[22]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[23]  Peter A. Flach,et al.  Proceedings of the 28th International Conference on Machine Learning , 2011 .

[24]  Charles A. Sutton,et al.  Scheduled denoising autoencoders , 2015, ICLR.

[25]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[26]  M. V. Rossum,et al.  In Neural Computation , 2022 .

[27]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[28]  Michael J. Todd,et al.  Mathematical programming , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[29]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[30]  Ning Chen,et al.  Dropout Training for Support Vector Machines , 2014, AAAI.

[31]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.