Inferring Emphasis for Real Voice Data: An Attentive Multimodal Neural Network Approach
暂无分享,去创建一个
Wei Chen | Fei Yu | Yanfeng Wang | Jialie Shen | Fanbo Meng | Long Zhang | Suping Zhou | Jia Jia
[1] Maosong Sun,et al. Punctuation as Implicit Annotations for Chinese Word Segmentation , 2009, CL.
[2] Wei Chen,et al. Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[3] D. Ladd,et al. The perception of intonational emphasis: continuous or categorical? , 1997 .
[4] Lan Wang,et al. Automatic lexical stress detection for Chinese learners' of English , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.
[5] Milos Cernak,et al. An empirical model of emphatic word detection , 2015, INTERSPEECH.
[6] Lianhong Cai,et al. Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Mattias Heldner,et al. Spectral emphasis as an additional source of information in accent detection , 2001 .
[8] Martin Heckmann,et al. Evaluation of optical flow field features for the detection of word prominence in a human-machine interaction scenario , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).
[9] Daniel Jurafsky,et al. The detection of emphatic words using acoustic and lexical features , 2005, INTERSPEECH.
[10] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[11] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[12] Tomoki Toda,et al. Preserving Word-Level Emphasis in Speech-to-Speech Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[14] Kristin Precoda,et al. Lexical stress classification for language learning using spectral and segmental features , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Jiebo Luo,et al. Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM , 2018, ACM Multimedia.
[16] Jürgen Schmidhuber,et al. Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..
[17] Lianhong Cai,et al. Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition , 2016, INTERSPEECH.
[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[19] Qi Wang,et al. Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach , 2018, AAAI.
[20] Daniel P. W. Ellis,et al. Pitch-based emphasis detection for characterization of meeting recordings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[21] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[22] Martin Heckmann,et al. Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenario , 2014, INTERSPEECH.