Detecting out-of-domain utterances addressed to a virtual personal assistant

Conversational understanding systems, especially virtual personal assistants (VPAs), perform “targeted” natural language understanding, assuming their users stay within the walled gardens of covered domains, and back-off to generic web search otherwise. However, users usually do not know the concept of domains and sometimes simply do not distinguish the system from simple voice search. Hence it becomes an important problem to identify these rejected out-of-domain utterances which are actually intended for the VPA. This paper presents a study tackling this new task, showing that how one utters a request is more important for this task than what is uttered, resembling addressee detection or dialog act tagging. To this end, syntactic and semantic parse “structure” features are extracted in addition to lexical features to train a binary SVM classifier using a large number of random web search queries and VPA utterances from multiple domains. We present controlled experiments leaving one domain out and check the precision of the model when combined with unseen queries. Our results indicate that such structured features result in higher precision especially when the test domain bears little resemblance to the existing domains.

[1]  Gökhan Tür,et al.  IsNL? a discriminative approach to detect natural language like queries for conversational understanding , 2013, INTERSPEECH.

[2]  Mitchell P. Marcus,et al.  Adding Semantic Annotation to the Penn TreeBank , 1998 .

[3]  Gökhan Tür,et al.  Employing web search query click logs for multi-domain spoken language understanding , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  Dilek Z. Hakkani-Tür,et al.  Exploiting the Semantic Web for unsupervised spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[5]  Gokhan Tur,et al.  Multi-Domain Spoken Language Understanding with Approximate Inference , 2011 .

[6]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[7]  Sophie Rosset,et al.  Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice , 2013 .

[8]  Gary Geunbae Lee,et al.  Multi-domain spoken language understanding with transfer learning , 2009, Speech Commun..

[9]  Gary Geunbae Lee,et al.  A Two-Step Approach for Efficient Domain Selection in Multi-Domain Dialog Systems , 2014, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[10]  Gokhan Tur,et al.  Intent Determination and Spoken Utterance Classification , 2011 .

[11]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[12]  Gökhan Tür,et al.  The AT&T spoken language understanding system , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Gökhan Tür,et al.  Approximate Inference for Domain Detection in Spoken Language Understanding , 2011, INTERSPEECH.

[14]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Satoshi Nakamura,et al.  Out-of-Domain Utterance Detection Using Classification Confidences of Multiple Topics , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[18]  Dilek Z. Hakkani-Tür,et al.  Learning When to Listen: Detecting System-Addressed Speech in Human-Human-Computer Dialog , 2012, INTERSPEECH.

[19]  J. Lowe,et al.  A Frame-Semantic Approach to Semantic Annotation , 1997 .

[20]  Philipp Koehn,et al.  Abstract Meaning Representation for Sembanking , 2013, LAW@ACL.

[21]  Alessandro Moschitti,et al.  Spoken language understanding with kernels for syntactic/semantic structures , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[22]  Teofilo F. Gonzalez,et al.  Computing Handbook, Third Edition , 2014 .

[23]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[24]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Jerome R. Bellegarda,et al.  Spoken Language Understanding for Natural Interaction: The Siri Experience , 2012, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.