Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing

In this paper we analytically compare the two widely accepted approaches of spoken document indexing, position specific posterior lattices (PSPL) and confusion network (CN), in terms of retrieval accuracy and index size. The fundamental distinctions between these two approaches in terms of construction units, posterior probabilities, number of clusters, indexing coverage and space requirements are discussed in detail. A new approach to approximate subword posterior probability in a word lattice is also incorporated in PSPL/CN to handle OOV/rare word problems, which were unaddressed in original PSPL and CN approaches. Extensive experimental results on Chinese broadcast news segments indicate that PSPL offers higher accuracy than CN but requiring much larger disk space, while subword-based PSPL turns out to be very attractive because it lowers the storage cost while offers even higher accuracies.

[1]  Lin-Shan Lee,et al.  Improved Large Vocabulary Continuous Chinese Speech Recognition by Character-Based Consensus Networks , 2006, ISCSLP.

[2]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[3]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[4]  James R. Glass,et al.  Open-Vocabulary Spoken Utterance Retrieval using Confusion Networks , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[6]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[7]  Alex Acero,et al.  Soft indexing of speech content for search in spoken documents , 2007, Comput. Speech Lang..

[8]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[9]  Peng Yu,et al.  Towards Spoken-Document Retrieval for the Internet: Lattice Indexing For Large-Scale Web-Search Architectures , 2006, NAACL.

[10]  Lin-Shan Lee,et al.  Subword-based position specific posterior lattices (s-PSPL) for indexing speech information , 2007, INTERSPEECH.

[11]  David Carmel,et al.  Spoken document retrieval from call-center conversations , 2006, SIGIR.

[12]  Lin-Shan Lee,et al.  Statistics-based segment pattern lexicon-a new direction for Chinese language modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..