论文信息 - A Hybrid Approach to Scalable and Robust Spoken Language Understanding in Enterprise Virtual Agents

A Hybrid Approach to Scalable and Robust Spoken Language Understanding in Enterprise Virtual Agents

Spoken language understanding (SLU) extracts the intended mean- ing from a user utterance and is a critical component of conversational virtual agents. In enterprise virtual agents (EVAs), language understanding is substantially challenging. First, the users are infrequent callers who are unfamiliar with the expectations of a pre-designed conversation flow. Second, the users are paying customers of an enterprise who demand a reliable, consistent and efficient user experience when resolving their issues. In this work, we describe a general and robust framework for intent and entity extraction utilizing a hybrid of statistical and rule-based approaches. Our framework includes confidence modeling that incorporates information from all components in the SLU pipeline, a critical addition for EVAs to en- sure accuracy. Our focus is on creating accurate and scalable SLU that can be deployed rapidly for a large class of EVA applications with little need for human intervention.

[1] Francesco Caltagirone,et al. Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[2] Ryan Price. End-To-End Spoken Language Understanding Without Matched Language Speech Model Pretraining Data , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[4] Matthew Henderson,et al. Discriminative spoken language understanding using word confusion networks , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[5] Yannick Estève,et al. Investigating Adaptation and Transfer Learning for End-to-End Spoken Language Understanding from Speech , 2019, INTERSPEECH.

[6] Yoshua Bengio,et al. Speech Model Pre-training for End-to-End Spoken Language Understanding , 2019, INTERSPEECH.

[7] Michael Picheny,et al. Semantic confidence measurement for spoken dialog systems , 2005, IEEE Transactions on Speech and Audio Processing.

[8] Wei Xu,et al. Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[9] Wayne H. Ward,et al. Confidence measures for spoken dialogue systems , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10] Fuchun Peng,et al. Learning Personalized Pronunciations for Contact Name Recognition , 2016, INTERSPEECH.

[11] Geoffrey Zweig,et al. Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12] Ruhi Sarikaya,et al. Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[13] Gareth M. James,et al. Challenges For Spoken Dialogue Systems , 1999 .

[14] Dong Yu,et al. Improved name recognition with user modeling , 2003, INTERSPEECH.

[15] James Allan,et al. Matching Inconsistently Spelled Names in Automatic Speech Recognizer Output for Information Retrieval , 2005, HLT.

[16] Varun Sharma,et al. Fast Intent Classification for Spoken Language Understanding Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] P. J. Price,et al. Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[18] Christopher D. Manning,et al. Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[19] David Thomson,et al. Practical Application of Domain Dependent Confidence Measurement for Spoken Language Understanding Systems , 2018, NAACL.

[20] Gökhan Tür,et al. Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[21] Gökhan Tür,et al. The AT&T spoken language understanding system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22] Jason D. Williams,et al. Estimating Probability of Correctness for ASR N-Best Lists , 2009, SIGDIAL Conference.

[23] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[24] Bing Liu,et al. Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[25] Katrin Kirchhoff,et al. Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems , 2019, SIGdial.

[26] Mingda Li,et al. Improving Spoken Language Understanding By Exploiting ASR N-best Hypotheses , 2020, ArXiv.