Multi-Speaker Language Modeling

In conventional language modeling, the words from only one speaker at a time are represented, even for conversational tasks such as meetings and telephone calls. In a conversational or meeting setting, however, speakers can have significant influence on each other. To recover such un-modeled inter-speaker information, we introduce an approach for conversational language modeling that considers words from other speakers when predicting words from the current one. By augmenting a normal trigram context, our new multi-speaker language model (MSLM) improves on both Switchboard and ICSI Meeting Recorder corpora. Using an MSLM and a conditional mutual information based word clustering algorithm, we achieve a 8.9% perplexity reduction on Switchboard and a 12.2% reduction on the ICSI Meeting Recorder data.

[1]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hang Li,et al.  Exploring Asymmetric Clustering for Statistical Language Modeling , 2002, ACL.

[3]  Shuntaro Isogai,et al.  Multi-Class Composite N-gram Language Model for Spoken Language Processing Using Multiple Word Clusters , 2001, ACL.

[4]  Jeff A. Bilmes,et al.  Factored Language Models and Generalized Parallel Backoff , 2003, NAACL.

[5]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[6]  Philip C. Woodland,et al.  Language modelling for Russian and English using words and classes , 2003, Comput. Speech Lang..

[7]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[8]  Jeff A. Bilmes,et al.  Novel approaches to Arabic speech recognition: report from the 2002 Johns-Hopkins Summer Workshop , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.