Topic Identification and Discovery on Text and Speech

We compare the multinomial i-vector framework from the speech community with LDA, SAGE, and LSA as feature learners for topic ID on multinomial speech and text data. We also compare the learned representations in their ability to discover topics, quantified by distributional similarity to gold-standard topics and by human interpretability. We find that topic ID and topic discovery are competing objectives. We argue that LSA and i-vectors should be more widely considered by the text processing community as pre-processing steps for downstream tasks, and also speculate about speech processing tasks that could benefit from more interpretable representations like SAGE.

[1]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[2]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[3]  Lukás Burget,et al.  Prosodic speaker verification using subspace multinomial models with intersession compensation , 2010, INTERSPEECH.

[4]  Sanjeev Khudanpur,et al.  Limited resource term detection for effective topic identification of speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Mohamed Morchid,et al.  An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents , 2014, EMNLP.

[6]  Timothy J. Hazen,et al.  Topic identification from audio recordings using word and phone recognition lattices , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[7]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[8]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[9]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[10]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[12]  Mark Dredze,et al.  Sprite: Generalizing Topic Models with Structured Priors , 2015, TACL.

[13]  Xiaohui Zhang,et al.  Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.

[14]  Mark Dredze,et al.  Shared Components Topic Models , 2012, HLT-NAACL.

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[17]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[20]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[21]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[22]  Stefan Evert,et al.  A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection , 2014, TACL.

[23]  Jonathan Wintrode Leveraging locality for topic identification of conversational speech , 2013, INTERSPEECH.

[24]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[25]  Lukás Burget,et al.  iVector Approach to Phonotactic Language Recognition , 2011, INTERSPEECH.

[26]  Hsin-Hsi Chen,et al.  I-vector based language modeling for spoken document retrieval , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Michael J. Paul,et al.  A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics , 2010, AAAI.

[28]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[29]  Xiaohui Zhang,et al.  Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Hila Becker,et al.  Identification and Characterization of Events in Social Media , 2011 .

[31]  Alan McCree,et al.  DNN senone MAP multinomial i-vectors for phonotactic language recognition , 2015, Interspeech.

[32]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.