Multimedia retrieval through indexing speech: an enterprise perspective

The institutional memory of enterprises is increasingly comprised of digital multimedia content, such as online lecture videos and presentations, archived meetings or conference calls, and voicemail. A key technology for efficiently managing such content is keyword search into the spoken audio content using automatic speech recognition (ASR). A key learning for deploying ASR-based indexing in enterprises is that multimedia content is often not stored in a centralized hosting application, but in a "long tail' of small teams' intranet sites, often built by technology enthusiasts who like to tinker and make creative use of technology. This calls for an indexing platform rather than a standalone app, audio indexing being one feature, easy to deploy with limited IT skills in a "do-it-yourself"-manner, and integrating with the existing information-management infrastructure. We will present approaches to three enterprise-characteristic challenges arising from these requirements: (1) Probabilistic indexing of word lattices instead of speech-to-text transcripts, to address the limited recognition accuracy (often in the 50% range due to lack of matching acoustic/domain corpora); (2) phonetic search and vocabulary adaptation for indexing person names, domain terminology, and code names missing in a standard recognizer; and (3) approximations to implement probabilistic lattice indexing on top of existing industry-strength full-text search engines, for maximal reuse and integration with existing tools and deployments to reduce cost, and to enable non-speech experts to manage and operate indexing/search system and build/mesh-up line-of-business applications around it.

[1]  Frank Seide,et al.  Word-lattice based spoken-document indexing with standard text indexers , 2008, 2008 IEEE Spoken Language Technology Workshop.

[2]  Richard Sproat,et al.  Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.

[3]  Peng Yu,et al.  Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.