Spoken Language Understanding in a Latent Topic-Based Subspace

Performance of spoken language understanding applications declines when spoken documents are automatically transcribed in noisy conditions due to high Word Error Rates (WER). To improve the robustness to transcription errors, recent solutions propose to map these automatic transcriptions in a latent space. These studies have proposed to compare classical topic-based representations such as Latent Dirichlet Allocation (LDA), supervised LDA and author-topic (AT) models. An original compact representation, called c-vector, has recently been introduced to walk around the tricky choice of the number of latent topics in these topic-based representations. Moreover, c-vectors allow to increase the robustness of document classification with respect to transcription errors by compacting different LDA representations of a same speech document in a reduced space and then compensate most of the noise of the document representation. The main drawback of this method is the number of sub-tasks needed to build the c-vector space. This paper proposes to both improve this compact representation (c-vector) of spoken documents and to reduce the number of needed sub-tasks, using an original framework in a robust low dimensional space of features from a set of AT models called "Latent Topic-based Sub-space" (LTS). In comparison to LDA, the AT model considers not only the dialogue content (words), but also the class related to the document. Experiments are conducted on the DECODA corpus containing speech conversations from the call-center of the RATP Paris transportation company. Results show that the original LTS representation outperforms the best previous compact representation (c-vector), with a substantial gain of more than 2.5% in terms of correctly labeled conversations.

[1]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[2]  Mohamed Morchid,et al.  An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents , 2014, EMNLP.

[3]  Gene H. Golub,et al.  Singular value decomposition and least squares solutions , 1970, Milestones in Matrix Computation.

[4]  Mohamed Morchid,et al.  Author-topic based representation of call-center conversations , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[5]  Mohamed Morchid,et al.  I-vector based representation of highly imperfect automatic transcriptions , 2014, INTERSPEECH.

[6]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[7]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[8]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[9]  H. Abdi,et al.  Principal component analysis , 2010 .

[10]  Driss Matrouf,et al.  Intersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition , 2011, INTERSPEECH.

[11]  Frédéric Béchet,et al.  DECODA: a call-centre human-human spoken conversation corpus , 2012, LREC.

[12]  Francis Ferraro,et al.  Topic Identification and Discovery on Text and Speech , 2015, EMNLP.

[13]  Krista Lagus,et al.  Topic Identification in Natural Language Dialogues Using Neural Networks , 2002, SIGDIAL Workshop.

[14]  Matthew Purver,et al.  11. Topic Segmentation , 2011 .

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Mohamed Morchid,et al.  Compact Multiview Representation of Documents Based on the Total Variability Space , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[17]  Regina Barzilay,et al.  Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[18]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[19]  Timothy J. Hazen Topic Identification , 2014, Encyclopedia of Social Network Analysis and Mining.