论文信息 - An Attention Based Deep Neural Network for Automatic Lexical Stress Detection

An Attention Based Deep Neural Network for Automatic Lexical Stress Detection

Lexical stress detection is one of important tasks in self-directed language learning application. We address this task by leveraging two successful attention techniques in natural language processing, inner attention and self-attention. First, combined with LSTM to model time-series features, inner attention could extract most important information and then convert length-varying input into a fixed-length feature vector; Second, self-attention intrinsically supports words with different number of syllables as input to model contexture information. Besides, our model is straightforward to expand to include hand-crafted features to further improve performance, and also can be applied to similar tasks, such as pitch accent detector. Experiments on LibriSpeech, TedLium and a third self-recored datasets show the high performance of our proposed attention based neural network.

[1] Kristin Precoda,et al. Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems , 2015, Speech Commun..

[2] Kun Li,et al. Lexical stress detection for L2 English speech using deep belief networks , 2013, INTERSPEECH.

[3] Yang Liu,et al. Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[4] Shuang Zhang,et al. Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection , 2011, INTERSPEECH.

[5] Kun Li,et al. Perceptually-motivated assessment of automatically detected lexical stress in L2 learners' speech , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[6] Hua Yuan,et al. Exploiting contextual information for prosodic event detection using auto-context , 2013, EURASIP J. Audio Speech Music. Process..

[7] Jing Xiao,et al. Adversarial Discrete Sequence Generation without Explicit NeuralNetworks as Discriminators , 2019, AISTATS.

[8] Xu Li,et al. Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks , 2018, Speech Commun..

[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[11] Shrikanth S. Narayanan,et al. Automatic syllable stress detection using prosodic features for pronunciation evaluation of language learners , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[13] Qun Liu,et al. Improving Alignment of System Combination by Using Multi-objective Optimization , 2013, EMNLP.

[14] Jia Liu,et al. Automatic lexical stress detection using acoustic features for computer-assisted language learning , 2011 .

[15] Kun Li,et al. Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks , 2015, SLaTE.

[16] Kun Li,et al. Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17] Lan Wang,et al. Automatic lexical stress detection for Chinese learners' of English , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[18] Jhing-Fa Wang,et al. Stress Detection Based on Multi-class Probabilistic Support Vector Machines for Accented English Speech , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.

[19] Tian Xia,et al. Direct optimization of ranking measures for learning to rank models , 2013, KDD.