Speech-Annotated Photo Retrieval Using Syllable-Transformed Patterns

This study presents a novel indexing and retrieval scheme for digital photos with speech annotations based on syllable-transformed image-like patterns. Speech recognition error and out-of-vocabulary (OOV) problems generally result in incorrect indexing and degrade the retrieval performance. In this study, the recognized n -best candidates used to deal with recognition error problems are transformed into an image-like pattern using multidimensional scaling. A hybrid mechanism integrating syllables, characters, words, and image-like patterns is exploited for speech indexing and retrieval. Experiments show the hybrid indexing method integrating the syllable-transformed image-like patterns can achieve a better result compared to previous indexing methods.

[1]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[2]  Trevor F. Cox,et al.  Metric multidimensional scaling , 2000 .

[3]  Kuldip K. Paliwal,et al.  Comments on "modified K-means algorithm for vector quantizer design" , 2000, IEEE Trans. Image Process..

[4]  Steve Young,et al.  The HTK book , 1995 .

[5]  Daniel P. W. Ellis,et al.  Speech feature smoothing for robust ASR , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  J. Todd Book Review: Digital image processing (second edition). By R. C. Gonzalez and P. Wintz, Addison-Wesley, 1987. 503 pp. Price: £29.95. (ISBN 0-201-11026-1) , 1988 .

[7]  Beth Logan,et al.  Approaches to reduce the effects of OOV queries on indexed spoken audio , 2005, IEEE Transactions on Multimedia.

[8]  Chung-Hsien Wu,et al.  Speech enhancement based on audible noise spectrum and short-time spectral amplitude estimator , 2002 .

[9]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[10]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[11]  Andrew G. Tescher,et al.  Practical transform coding of multispectral imagery , 1995, IEEE Signal Process. Mag..

[12]  Chung-Hsien Wu,et al.  Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition , 2004, J. VLSI Signal Process..

[13]  Lin-Shan Lee,et al.  Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese , 2002, IEEE Trans. Speech Audio Process..

[14]  Saduoki Furui Unsupervised speaker adaptation based on hierarchical spectral clustering , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  David A. James,et al.  A system for unrestricted topic retrieval from radio news broadcasts , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[16]  Timothy J. Hazen,et al.  Retrieval and browsing of spoken content , 2008, IEEE Signal Processing Magazine.

[17]  Salim Roukos,et al.  A multistage algorithm for spotting new words in speech , 2002, IEEE Trans. Speech Audio Process..

[18]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[19]  Chung-Hsien Wu,et al.  Speech act modeling and verification of spontaneous speech with disfluency in a spoken dialogue system , 2005, IEEE Transactions on Speech and Audio Processing.

[20]  Kerry Rodden,et al.  How do people manage their digital photographs? , 2003, CHI '03.

[21]  Chung-Hsien Wu,et al.  Multi-keyword spotting of telephone speech using a fuzzy search algorithm and keyword-driven two-level CBSM , 2001, Speech Commun..

[22]  Peng Yu,et al.  Vocabulary-independent indexing of spontaneous speech , 2005, IEEE Transactions on Speech and Audio Processing.

[23]  Lawrence R. Rabiner,et al.  A modified K-means clustering algorithm for use in isolated work recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[24]  Chris Buckley,et al.  SMART in TREC 8 , 1999, Text Retrieval Conference.