Universal Adversarial Attacks on Spoken Language Assessment Systems

There is an increasing demand for automated spoken language assessment (SLA) systems, partly driven by the performance improvements that have come from deep learning based approaches. One aspect of deep learning systems is that they do not require expert derived features, operating directly on the original signal such as a speech recognition (ASR) transcript. This, however, increases their potential susceptibility to adversarial attacks as a form of candidate malpractice. In this paper the sensitivity of SLA systems to a universal black-box attack on the ASR text output is explored. The aim is to obtain a single, universal phrase to maximally increase any candidate’s score. Four approaches to detect such adversarial attacks are also described. All the systems, and associated detection approaches, are evaluated on a free (spontaneous) speaking section from a Business English test. It is shown that on deep learning based SLA systems the average candidate score can be increased by almost one grade level using a single six word phrase appended to the end of the response hypothesis. Although these large gains can be obtained, they can be easily detected based on detection shifts from the scores of a “traditional” Gaussian Process based grader.

[1]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for NLP , 2017, ArXiv.

[2]  Andrey Malinin,et al.  Reverse KL-Divergence Training of Prior Networks: Improved Uncertainty and Adversarial Robustness , 2019, NeurIPS.

[3]  Jun Zhu,et al.  Robust Deep Learning via Reverse Cross-Entropy Training and Thresholding Test , 2017, ArXiv.

[4]  Mark J. F. Gales,et al.  A Deep Learning Approach to Assessing Non-native Pronunciation of English Using Phone Distances , 2018, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  Incorporating Uncertainty into Deep Learning for Spoken Language Assessment , 2017, ACL.

[6]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[7]  Mark J. F. Gales,et al.  Automatically grading learners' English using a Gaussian process , 2015, SLaTE.

[8]  Dongxiao Zhu,et al.  Text Classification with Topic-based Word Embedding and Convolutional Neural Networks , 2016, BCB.

[9]  Mark J. F. Gales,et al.  CUED-RNNLM — An open-source toolkit for efficient training and evaluation of recurrent neural network language models , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[11]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Kate Knill,et al.  Complementary Systems for Off-Topic Spoken Response Detection , 2020, BEA.

[14]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[15]  Alexandros G. Dimakis,et al.  Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification , 2018, MLSys.

[16]  Yu Wang,et al.  Towards automatic assessment of spontaneous spoken English , 2018, Speech Commun..

[17]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[19]  Holger Ulmer,et al.  Ensemble Methods as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2017, ArXiv.

[20]  Mark J. F. Gales,et al.  A hierarchical attention based model for off-topic spontaneous spoken response detection , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[21]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[22]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[23]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Attacks on Text Classifiers , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Xiaoming Xi,et al.  A three-stage approach to the automated scoring of spontaneous spoken responses , 2011, Comput. Speech Lang..

[25]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[26]  Aleksander Madry,et al.  Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors , 2018, ICLR.

[27]  Zhibo Wang,et al.  A survey on Adversarial Attacks and Defenses in Text , 2019, ArXiv.

[28]  Yiwen Guo,et al.  Subspace Attack: Exploiting Promising Subspaces for Query-Efficient Black-box Attacks , 2019, NeurIPS.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  Hai Li,et al.  Feature Space Perturbations Yield More Transferable Adversarial Examples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[33]  Ananthram Swami,et al.  Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples , 2016, ArXiv.

[34]  Xiaoming Xi,et al.  Automatic scoring of non-native spontaneous speech in tests of spoken English , 2009, Speech Commun..

[35]  Dawn Xiaodong Song,et al.  Delving into Transferable Adversarial Examples and Black-box Attacks , 2016, ICLR.

[36]  Alan L. Yuille,et al.  Improving Transferability of Adversarial Examples With Input Diversity , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Michael I. Jordan,et al.  Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data , 2018, J. Mach. Learn. Res..