Deep Active Learning by Model Interpretability

Recent successes of Deep Neural Networks (DNNs) in a variety of research tasks, however, heavily rely on the large amounts of labeled samples. This may require considerable annotation cost in real-world applications. Fortunately, active learning is a promising methodology to train high-performing model with minimal annotation cost. In the deep learning context, the critical question of active learning is how to precisely identify the informativeness of samples for DNN. In this paper, inspired by piece-wise linear interpretability in DNN, we introduce the linearly separable regions of samples to the problem of active learning, and propose a novel Deep Active learning approach by Model Interpretability (DAMI). To keep the maximal representativeness of the entire unlabeled data, DAMI tries to select and label samples on different linearly separable regions introduced by the piece-wise linear interpretability in DNN. We focus on modeling Multi-Layer Perception (MLP) for modeling tabular data. Specifically, we use the local piece-wise interpretation in MLP as the representation of each sample, and directly run K-Center clustering to select and label samples. To be noted, this whole process of DAMI does not require any hyper-parameters to tune manually. To verify the effectiveness of our approach, extensive experiments have been conducted on several tabular datasets. The experimental results demonstrate that DAMI constantly outperforms several state-of-the-art compared approaches.

[1]  Ye Zhang,et al.  Active Discriminative Text Representation Learning , 2016, AAAI.

[2]  Hsuan-Tien Lin,et al.  Active Learning by Learning , 2015, AAAI.

[3]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[4]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[6]  Sheng-Jun Huang,et al.  Active Learning with Query Generation for Cost-Effective Text Classification , 2020, AAAI.

[7]  Maneesh Singh,et al.  Sampling Bias in Deep Active Classification: An Empirical Study , 2019, EMNLP.

[8]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[9]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[10]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[12]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[13]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[14]  Jieping Ye,et al.  Querying discriminative and representative samples for batch mode active learning , 2013, KDD.

[15]  Abbas Mehrabian,et al.  Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.

[16]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[17]  Xiaoli Z. Fern,et al.  Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference , 2018, EMNLP.

[18]  Yongjun Chen,et al.  Interpreting Deep Models for Text Analysis via Optimization and Regularization Methods , 2019, AAAI.

[19]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[20]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[21]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[22]  Qiang Liu,et al.  Towards Accurate and Interpretable Sequential Prediction: A CNN & Attention-Based Feature Extractor , 2019, CIKM.

[23]  Frédéric Precioso,et al.  Adversarial Active Learning for Deep Networks: a Margin Based Approach , 2018, ArXiv.

[24]  Xiaobing Sun,et al.  Understanding Attention for Text Classification , 2020, ACL.

[25]  Yiming Yang,et al.  Active Learning for Graph Neural Networks via Node Feature Propagation , 2019, ArXiv.

[26]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[28]  Quanshi Zhang,et al.  Towards a Deep and Unified Understanding of Deep Neural Models in NLP , 2019, ICML.

[29]  Zachary C. Lipton,et al.  Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study , 2018, EMNLP.

[30]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[31]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[32]  Hang Su,et al.  Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples , 2017, ArXiv.

[33]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[34]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[35]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[37]  John Langford,et al.  Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds , 2019, ICLR.

[38]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[39]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[40]  Jian Pei,et al.  Exact and Consistent Interpretation for Piecewise Linear Neural Networks: A Closed Form Solution , 2018, KDD.

[41]  Hairong Liu,et al.  Active Learning for Speech Recognition: the Power of Gradients , 2016, ArXiv.

[42]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[43]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[44]  Jun Zhu,et al.  Big Learning with Bayesian Methods , 2014, ArXiv.