Towards Automated Error Analysis: Learning to Characterize Errors

Characterizing the patterns of errors that a system makes helps researchers focus future development on increasing its accuracy and robustness. We propose a novel form of ”meta learning” that automatically learns interpretable rules that characterize the types of errors that a system makes, and demonstrate these rules’ ability to help understand and improve two NLP systems. Our approach works by collecting error cases on validation data, extracting meta-features describing these samples, and finally learning rules that characterize errors using these features. We apply our approach to VilBERT, for Visual Question Answering, and RoBERTa, for Common Sense Question Answering. Our system learns interpretable rules that provide insights into systemic errors these systems make on the given tasks. Using these insights, we are also able to “close the loop” and modestly improve performance of these systems.

[1]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Mark Craven,et al.  Rule Extraction: Where Do We Go from Here? , 1999 .

[4]  Lars Niklasson,et al.  Evolving decision trees using oracle guides , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Ali Farhadi,et al.  Towards Transparent Systems: Semantic Characterization of Failure Modes , 2014, ECCV.

[7]  Yue Zhang,et al.  Exploiting meta features for dependency parsing and part-of-speech tagging , 2016, Artif. Intell..

[8]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[9]  Dhruv Batra,et al.  Analyzing the Behavior of Visual Question Answering Models , 2016, EMNLP.

[10]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[11]  Sanjay Krishnan,et al.  PALM: Machine Learning Explanations For Iterative Debugging , 2017, HILDA@SIGMOD.

[12]  Christopher Kanan,et al.  An Analysis of Visual Question Answering Algorithms , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Yash Goyal,et al.  Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[15]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[16]  Walter Daelemans,et al.  Rule induction for global explanation of trained models , 2018, BlackboxNLP@EMNLP.

[17]  Kirthevasan Kandasamy,et al.  Neural Architecture Search with Bayesian Optimisation and Optimal Transport , 2018, NeurIPS.

[18]  Gregory Shakhnarovich,et al.  Discriminability Objective for Training Descriptive Captions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Ser-Nam Lim,et al.  Explain Black-box Image Classifications Using Superpixel-based Interpretation , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[20]  Marc-Antoine Rondeau,et al.  Systematic Error Analysis of the Stanford Question Answering Dataset , 2018, QA@ACL.

[21]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[22]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[23]  Jeffrey Heer,et al.  Errudite: Scalable, Reproducible, and Testable Error Analysis , 2019, ACL.

[24]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[25]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[26]  Larry S. Davis,et al.  Explicit Bias Discovery in Visual Question Answering Models , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ronan Le Bras,et al.  Generative Data Augmentation for Commonsense Reasoning , 2020, EMNLP 2020.

[28]  Tim Kraska,et al.  Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach , 2018, IEEE Transactions on Knowledge and Data Engineering.

[29]  Colin White,et al.  Local Search is State of the Art for Neural Architecture Search Benchmarks , 2020 .

[30]  Jing Huang,et al.  Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting , 2020, ECCV.

[31]  T. Evgeniou,et al.  Metafeatures-based Rule-Extraction for Classifiers on Behavioral and Textual Data , 2020, Mach. Learn..

[32]  Yiming Yang,et al.  Predicting Performance for Natural Language Processing Tasks , 2020, ACL.

[33]  Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Kalyanmoy Deb,et al.  Neural Architecture Transfer , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Qi Wu,et al.  Referring Expression Comprehension: A Survey of Methods and Datasets , 2020, IEEE Transactions on Multimedia.

[36]  ‘Just because you are right, doesn’t mean I am wrong’: Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks , 2021, EACL.

[37]  Lars Schmidt-Thieme,et al.  Dataset2Vec: learning dataset meta-features , 2019, Data Mining and Knowledge Discovery.