Zero-shot semantic parser for spoken language understanding

Machine learning algorithms are now common in the state-ofthe-art spoken language understanding models. But to reach good performance they must be trained on a potentially large amount of data which are not available for a variety of tasks and languages of interest. In this work, we present a novel zero-shot learning method, based on word embeddings, allowing to derive a full semantic parser for spoken language understanding. No annotated in-context data are needed, the ontological description of the target domain and generic word embedding features (learned from freely available general domain data) suffice to derive the model. Two versions are studied with respect to how the model parameters and decoding step are handled, including an extension of the proposed approach in the context of conditional random fields. We show that this model, with very little supervision, can reach instantly performance comparable to those obtained by either state-of-the-art carefully handcrafted rule-based or trained statistical models for extraction of dialog acts on the Dialog State Tracking test datasets (DSTC2 and 3).

[1]  Fabrice Lefèvre,et al.  A unified framework for translation and understanding allowing discriminative joint decoding for multilingual speech semantic interpretation , 2016, Comput. Speech Lang..

[2]  Gökhan Tür,et al.  Active labeling for spoken language understanding , 2003, INTERSPEECH.

[3]  Dilek Z. Hakkani-Tür,et al.  Exploiting the Semantic Web for unsupervised spoken language understanding , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[4]  Ruhi Sarikaya,et al.  Deep belief network based semantic taggers for spoken language understanding , 2013, INTERSPEECH.

[5]  Young-Bum Kim,et al.  Task specific continuous word representations for mono and multi-lingual spoken language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Yoshua Bengio,et al.  Zero-data Learning of New Tasks , 2008, AAAI.

[7]  Lina Maria Rojas-Barahona,et al.  Unsupervised structured semantic inference for spoken dialog reservation tasks , 2013, SIGDIAL Conference.

[8]  Geoffrey Zweig,et al.  Recurrent conditional random field for language understanding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[10]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[11]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  Fabrice Lefèvre,et al.  Unsupervised Alignment for Segmental-based Language Understanding , 2011, ULNLP@EMNLP.

[14]  Gökhan Tür,et al.  Zero-Shot Learning and Clustering for Semantic Utterance Classification , 2013, ICLR.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Jason D. Williams,et al.  Web-style ranking and SLU combination for dialog state tracking , 2014, SIGDIAL Conference.

[17]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[18]  Matthew Henderson,et al.  The third Dialog State Tracking Challenge , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[19]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[20]  Gökhan Tür,et al.  Exploiting query click logs for utterance domain detection in spoken language understanding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Fabrice Lefèvre,et al.  Online adaptative zero-shot learning spoken language understanding using word-embedding , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Fabrice Lefèvre,et al.  Comparison and Combination of Lightly Supervised Approaches for Language Portability of a Spoken Language Understanding System , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Ruhi Sarikaya,et al.  Rapid bootstrapping of statistical spoken dialogue systems , 2008, Speech Commun..

[24]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[25]  Benoît Favre,et al.  Robustesse et portabilités multilingue et multi-domaines des systèmes de compréhension de la parole : les corpus du projet PortMedia (Robustness and portability of spoken language understanding systems among languages and domains : the PORTMEDIA project) [in French] , 2012, JEP/TALN/RECITAL.

[26]  Liang Gu,et al.  Portability challenges in developing interactive dialogue systems , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[27]  Alexandre Allauzen,et al.  From n-gram-based to CRF-based Translation Models , 2011, WMT@EMNLP.

[28]  Lu Chen,et al.  Semantic parser enhancement for dialogue domain extension with little data , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[29]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[30]  Fabrice Lef DYNAMIC BAYESIAN NETWORKS AND DISCRIMINATIVE CLASSIFIERS FOR MULTI-STAGE SEMANTIC INTERPRETATION , 2007 .

[31]  Florent Perronnin,et al.  Aggregating Continuous Word Embeddings for Information Retrieval , 2013, CVSM@ACL.

[32]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[33]  Dilek Z. Hakkani-Tür,et al.  Leveraging Web Query Logs to Learn User Intent Via Bayesian Discrete Latent Variable Model , 2011 .

[34]  Boris Detienne,et al.  Unsupervised Concept Annotation using Latent Dirichlet Allocation and Segmental Methods , 2011, ULNLP@EMNLP.

[35]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[36]  Hermann Ney,et al.  Comparing Stochastic Approaches to Spoken Language Understanding in Multiple Languages , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[37]  Gökhan Tür,et al.  Towards Unsupervised Spoken Language Understanding: Exploiting Query Click Logs for Slot Filling , 2011, INTERSPEECH.

[38]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[39]  Tie-Yan Liu,et al.  Knowledge-Powered Deep Learning for Word Embedding , 2014, ECML/PKDD.

[40]  Fabrice Lefèvre,et al.  Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation , 2010, INTERSPEECH.

[41]  Georg Heigold,et al.  Word embeddings for speech recognition , 2014, INTERSPEECH.