Deep Bayesian Self-Training

Supervised deep learning has been highly successful in recent years, achieving state-of-the-art results in most tasks. However, with the ongoing uptake of such methods in industrial applications, the requirement for large amounts of annotated data is often a challenge. In most real-world problems, manual annotation is practically intractable due to time/labour constraints; thus, the development of automated and adaptive data annotation systems is highly sought after. In this paper, we propose both a (1) deep Bayesian self-training methodology for automatic data annotation, by leveraging predictive uncertainty estimates using variational inference and modern neural network (NN) architectures, as well as (2) a practical adaptation procedure for handling high label variability between different dataset distributions through clustering of NN latent variable representations. An experimental study on both public and private datasets is presented illustrating the superior performance of the proposed approach over standard self-training baselines, highlighting the importance of predictive uncertainty estimates in safety-critical domains.

[1]  Bo Wang,et al.  Deep Co-Training for Semi-Supervised Image Recognition , 2018, ECCV.

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Lukasz Kaiser,et al.  One Model To Learn Them All , 2017, ArXiv.

[5]  R. Maull,et al.  Are Distributed Ledger Technologies the panacea for food traceability? , 2019, Global Food Security.

[6]  Daphne Koller,et al.  Active learning: theory and applications , 2001 .

[7]  Stefanos D. Kollias,et al.  Towards a Deep Unified Framework for Nuclear Reactor Perturbation Analysis , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[8]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[9]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[10]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[11]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[13]  Huizhong Chen,et al.  Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions , 2011, 2011 18th IEEE International Conference on Image Processing.

[14]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[15]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[17]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[19]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[20]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[22]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[23]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[24]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[25]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Stefanos D. Kollias,et al.  An End-to-End Deep Neural Architecture for Optical Character Verification and Recognition in Retail Food Packaging , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[27]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Lorien Y. Pratt,et al.  Discriminability-Based Transfer between Neural Networks , 1992, NIPS.

[29]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[30]  Yoshua Bengio,et al.  Entropy Regularization , 2006, Semi-Supervised Learning.

[31]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[32]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[33]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[35]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[36]  Zoubin Ghahramani,et al.  Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference , 2015, ArXiv.

[37]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[39]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[40]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[41]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[42]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[43]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Andrew Zisserman,et al.  Deep Features for Text Spotting , 2014, ECCV.

[45]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[46]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[49]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[50]  Christopher Joseph Pal,et al.  Brain tumor segmentation with Deep Neural Networks , 2015, Medical Image Anal..

[51]  Myunghee Cho Paik,et al.  Uncertainty quantification using Bayesian neural networks in classification: Application to ischemic stroke lesion segmentation , 2018 .

[52]  Andreas Stafylopatis,et al.  Adaptation and contextualization of deep neural network models , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[53]  Yoshua Bengio,et al.  The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[55]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[56]  Weilin Huang,et al.  Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees , 2014, ECCV.

[57]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Stefanos D. Kollias,et al.  An adaptable deep learning system for optical character verification in retail food packaging , 2018, 2018 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[59]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[60]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[61]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[62]  Somshubra Majumdar,et al.  Microaneurysm detection using fully convolutional neural networks , 2018, Comput. Methods Programs Biomed..

[63]  Stefanos D. Kollias,et al.  A Deep Learning Approach to Anomaly Detection in Nuclear Reactors , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[64]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[65]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[66]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[67]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[68]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[69]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.