A comparison of normalization techniques applied to latent space representations for speech analytics

In the context of noisy environments, Automatic Speech Recognition (ASR) systems usually produce poor transcription quality which also negatively impact performance of speech analyt-ics. Various methods have then been proposed to compensate the bad effect of ASR errors, mainly by projecting transcribed words in an abstract space. In this paper, we seek to identify themes from dialogues of telephone conversation services using latent topic-spaces estimated from a latent Dirichlet allocation (LDA). As an outcome, a document can be represented with a vector containing probabilities to be associated to each topic estimated with LDA. This vector should nonetheless be normalized to condition document representations. We propose to compare the original LDA vector representation (without normalization) with two normalization approaches, the Eigen Factor Radial (EFR) and the Feature Warping (FW) methods, already successfully applied in speaker recognition field, but never compared and evaluated in the context of a speech analytic task. Results show the interest of these normalization techniques for theme identification tasks using automatic transcriptions The EFR normalization approach allows a gain of 3.67 and 3.06 points respectively in comparison to the absence of normalization and to the FW normalization technique.

[1]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[2]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[3]  Mohamed Morchid,et al.  Improving dialogue classification using a topic space representation and a Gaussian classifier based on the decision rule , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[5]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[6]  Tao Dong,et al.  An Improved Algorithm of Bayesian Text Categorization , 2011, J. Softw..

[7]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[8]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[10]  Mohamed Morchid,et al.  Latent Topic Model Based Representations for a Robust Theme Identification of Highly Imperfect Automatic Transcriptions , 2015, CICLing.

[11]  Frédéric Béchet,et al.  DECODA: a call-centre human-human spoken conversation corpus , 2012, LREC.

[12]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[13]  Mohamed Morchid,et al.  A LDA-based Topic Classification Approach from highly Imperfect Automatic Transcriptions , 2014, LREC.

[14]  Driss Matrouf,et al.  Intersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition , 2011, INTERSPEECH.

[15]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[16]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[17]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[18]  Mohamed Morchid,et al.  An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents , 2014, EMNLP.

[19]  Mohamed Morchid,et al.  Integration of Word and Semantic Features for Theme Identification in Telephone Conversations , 2015, Natural Language Dialog Systems and Intelligent Assistants.

[20]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[21]  Jerome R. Bellegarda,et al.  A latent semantic analysis framework for large-Span language modeling , 1997, EUROSPEECH.

[22]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..