Multimedia retrieval through indexing speech: an enterprise perspective
暂无分享,去创建一个
The institutional memory of enterprises is increasingly comprised of digital multimedia content, such as online lecture videos and presentations, archived meetings or conference calls, and voicemail. A key technology for efficiently managing such content is keyword search into the spoken audio content using automatic speech recognition (ASR).
A key learning for deploying ASR-based indexing in enterprises is that multimedia content is often not stored in a centralized hosting application, but in a "long tail' of small teams' intranet sites, often built by technology enthusiasts who like to tinker and make creative use of technology. This calls for an indexing platform rather than a standalone app, audio indexing being one feature, easy to deploy with limited IT skills in a "do-it-yourself"-manner, and integrating with the existing information-management infrastructure.
We will present approaches to three enterprise-characteristic challenges arising from these requirements: (1) Probabilistic indexing of word lattices instead of speech-to-text transcripts, to address the limited recognition accuracy (often in the 50% range due to lack of matching acoustic/domain corpora); (2) phonetic search and vocabulary adaptation for indexing person names, domain terminology, and code names missing in a standard recognizer; and (3) approximations to implement probabilistic lattice indexing on top of existing industry-strength full-text search engines, for maximal reuse and integration with existing tools and deployments to reduce cost, and to enable non-speech experts to manage and operate indexing/search system and build/mesh-up line-of-business applications around it.
[1] Frank Seide,et al. Word-lattice based spoken-document indexing with standard text indexers , 2008, 2008 IEEE Spoken Language Technology Workshop.
[2] Richard Sproat,et al. Lattice-Based Search for Spoken Utterance Retrieval , 2004, NAACL.
[3] Peng Yu,et al. Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.