A Simultaneous Recognition Framework for the Spoken Language Understanding Module of Intelligent Personal Assistant Software on Smart Phones

The intelligent personal assistant software such as the Apple’s Siri and Samsung’s S-Voice has been issued these days. This paper introduces a novel Spoken Language Understanding (SLU) module to predict user’s intention for determining system actions of the intelligent personal assistant software. The SLU module usually consists of several connected recognition tasks on a pipeline framework, whereas the proposed SLU module simultaneously recognizes four recognition tasks on a recognition framework using Conditional Random Fields (CRF). The four tasks include named entity, speech-act, target and operation recognition. In the experiments, the new simultaneous recognition method achieves the higher performance of 4% and faster speed of about 25% than other method using a pipeline framework. By a significance test, this improvement is considered to be statistically significant as a p-value of smaller than 0.05.

[1]  Roland Reagan THE CU COMMUNICATOR SYSTEM , 1998 .

[2]  Ye-Yi Wang Robust Spoken Language Understanding in MiPad , 2001 .

[3]  Luísa Coheur,et al.  Towards the Rapid Development of a Natural Language Understanding Module , 2011, IVA.

[4]  Alex Acero,et al.  Discriminative models for spoken language understanding , 2006, INTERSPEECH.

[5]  Wayne H. Ward,et al.  Recent Improvements in the CMU Spoken Language Understanding System , 1994, HLT.

[6]  Changki Lee,et al.  Named Entity Recognition with Structural SVMs and Pegasos algorithm , 2010 .

[7]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[8]  Chin-Hui Lee,et al.  A speech understanding system based on statistical representation of semantics , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Steve J. Young,et al.  Semantic processing using the Hidden Vector State model , 2005, Comput. Speech Lang..

[10]  Richard M. Schwartz,et al.  Hidden Understanding Models of Natural Language , 1994, ACL.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[13]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[14]  Gary Geunbae Lee,et al.  Triangular-Chain Conditional Random Fields , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Alex Acero,et al.  Spoken Language Understanding "” An Introduction to the Statistical Framework , 2005 .

[16]  Wayne H. Ward,et al.  THE CU COMMUNICATOR SYSTEM 1 , 1999 .

[17]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.