Divergence-Based Similarity Measure for Spoken Document Retrieval

We propose a novel, divergence-based similarity measure for spoken document retrieval (SDR). We derive a dynamic programming algorithm that measures Kullback-Leibler divergence between two HMMs first. The measure is further generalized to a graph matching algorithm, which is efficient for SDR application. The proposed approach compares the underlying acoustic models of keywords and a target database to alleviate the impact of mismatched vocabulary and language model, e.g. different domains. Experimental results on the Wall Street Journal (WSJ) database show that the proposed approach achieves a comparable performance, compared with the word posterior based approach. It outperforms the latter when there is a mismatch in language model. The approach is promising for building an open-vocabulary, domain independent SDR application.

[1]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[2]  Nikos Fakotakis,et al.  Incremental Construction of Compact Acyclic NFAs , 2001, ACL.

[3]  Shi-wook Lee,et al.  Open-vocabulary spoken document retrieval based on new subword models and subword phonetic similarity , 2006, INTERSPEECH.

[4]  Steve J. Young,et al.  Large vocabulary continuous speech recognition using HTK , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Karen Spärck Jones,et al.  The Cambridge University spoken document retrieval system , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  M. Do Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models , 2003, IEEE Signal Processing Letters.

[7]  Alexander Franz,et al.  Searching the Web by Voice , 2002, COLING.

[8]  Jun Du,et al.  Minimum divergence based discriminative training , 2006, INTERSPEECH.

[9]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.