论文信息 - A Sentiment Analysis Dataset for Trustworthiness Evaluation - 字舞流文

A Sentiment Analysis Dataset for Trustworthiness Evaluation

While deep learning models have greatly improved the performance of most artificial intelligence tasks, they are often criticized to be untrustworthy due to the black-box problem. Consequently, many works have been proposed to study the trustworthiness of deep learning. However, as most open datasets are designed for evaluating the accuracy of model outputs, there is still a lack of appropriate datasets for evaluating the inner workings of neural networks. The lack of datasets obviously hinders the development of trustworthiness research. Therefore, in order to systematically evaluate the factors for building trustworthy systems, we propose a novel and well-annotated sentiment analysis dataset to evaluate robustness and interpretability. To evaluate these factors, our dataset contains diverse annotations about the challenging distribution of instances, manual adversarial instances and sentiment explanations. Several evaluation metrics are further proposed for interpretability and robustness. Based on the dataset and metrics, we conduct comprehensive comparisons for the trustworthiness of three typical models, and also study the relations between accuracy, robustness and interpretability. We release this trustworthiness evaluation dataset at https://github/xyz and hope our work can facilitate the progress on building more trustworthy systems for real-world applications.

Hua Wu | Haifeng Wang | Hongxuan Tang | Hao Liu | Lijie Wang | Xinyan Xiao | Shuyuan Peng | Ying Chen | Hua Wu | Haifeng Wang | Hao Liu | Lijie Wang | Ying Chen | Shu-ping Peng | Xinyan Xiao | Hongxuan Tang

[1] Emily M. Bender,et al. Towards Linguistically Generalizable NLP Systems: A Workshop and Shared Task , 2017, Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems.

[2] Dong Yu,et al. Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[3] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[4] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[5] Sameer Singh,et al. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[6] Byron C. Wallace,et al. Learning to Faithfully Rationalize by Construction , 2020, ACL.

[7] Bernease Herman,et al. The Promise and Peril of Human Evaluation for Model Interpretability , 2017, ArXiv.

[8] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.

[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[10] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[11] Mani B. Srivastava,et al. Generating Natural Language Adversarial Examples , 2018, EMNLP.

[12] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.

[13] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[14] Yoav Goldberg,et al. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[15] Byron C. Wallace,et al. ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[16] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[17] Graham Neubig,et al. Learning to Deceive with Attention-Based Explanations , 2020, ACL.

[18] Philipp Koehn,et al. Evaluating Saliency Methods for Neural Language Models , 2021, NAACL.

[19] Przemyslaw Biecek,et al. Models in the Wild: On Corruption Robustness of Neural NLP Systems , 2019, ICONIP.

[20] Xinyan Xiao,et al. SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis , 2020, ACL.

[21] Yoav Goldberg,et al. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[22] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[23] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[24] Karën Fort,et al. Yes, We Care! Results of the Ethics and Natural Language Processing Surveys , 2016, LREC.

[25] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.

[26] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.