论文信息 - Speech-Annotated Photo Retrieval Using Syllable-Transformed Patterns

Speech-Annotated Photo Retrieval Using Syllable-Transformed Patterns

This study presents a novel indexing and retrieval scheme for digital photos with speech annotations based on syllable-transformed image-like patterns. Speech recognition error and out-of-vocabulary (OOV) problems generally result in incorrect indexing and degrade the retrieval performance. In this study, the recognized n -best candidates used to deal with recognition error problems are transformed into an image-like pattern using multidimensional scaling. A hybrid mechanism integrating syllables, characters, words, and image-like patterns is exploited for speech indexing and retrieval. Experiments show the hybrid indexing method integrating the syllable-transformed image-like patterns can achieve a better result compared to previous indexing methods.

Chung-Hsien Wu | Yu-Sheng Lai | Chien-Lin Huang | Wei-Chuan Lee

[1] Hermann Ney,et al. The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[2] Trevor F. Cox,et al. Metric multidimensional scaling , 2000 .

[3] Kuldip K. Paliwal,et al. Comments on "modified K-means algorithm for vector quantizer design" , 2000, IEEE Trans. Image Process..

[4] Steve Young,et al. The HTK book , 1995 .

[5] Daniel P. W. Ellis,et al. Speech feature smoothing for robust ASR , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6] J. Todd. Book Review: Digital image processing (second edition). By R. C. Gonzalez and P. Wintz, Addison-Wesley, 1987. 503 pp. Price: £29.95. (ISBN 0-201-11026-1) , 1988 .

[7] Beth Logan,et al. Approaches to reduce the effects of OOV queries on indexed spoken audio , 2005, IEEE Transactions on Multimedia.

[8] Chung-Hsien Wu,et al. Speech enhancement based on audible noise spectrum and short-time spectral amplitude estimator , 2002 .

[9] Volume Assp,et al. ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[10] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[11] Andrew G. Tescher,et al. Practical transform coding of multispectral imagery , 1995, IEEE Signal Process. Mag..

[12] Chung-Hsien Wu,et al. Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition , 2004, J. VLSI Signal Process..

[13] Lin-Shan Lee,et al. Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese , 2002, IEEE Trans. Speech Audio Process..

[14] Saduoki Furui. Unsupervised speaker adaptation based on hierarchical spectral clustering , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15] David A. James,et al. A system for unrestricted topic retrieval from radio news broadcasts , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16] Timothy J. Hazen,et al. Retrieval and browsing of spoken content , 2008, IEEE Signal Processing Magazine.

[17] Salim Roukos,et al. A multistage algorithm for spotting new words in speech , 2002, IEEE Trans. Speech Audio Process..

[18] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[19] Chung-Hsien Wu,et al. Speech act modeling and verification of spontaneous speech with disfluency in a spoken dialogue system , 2005, IEEE Transactions on Speech and Audio Processing.

[20] Kerry Rodden,et al. How do people manage their digital photographs? , 2003, CHI '03.

[21] Chung-Hsien Wu,et al. Multi-keyword spotting of telephone speech using a fuzzy search algorithm and keyword-driven two-level CBSM , 2001, Speech Commun..

[22] Peng Yu,et al. Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.

[23] Lawrence R. Rabiner,et al. A modified K-means clustering algorithm for use in isolated work recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[24] Chris Buckley,et al. SMART in TREC 8 , 1999, Text Retrieval Conference.