Retrieval of mandarin broadcast news using spoken queries

Considering the monosyllabic structure of the Chinese language, a whole class of indexing features for retrieval of Mandarin broadcast news using syllable-level statistical characteristics has been previously investigated. This paper presents the improvements achieved over the previous results. The major differences are: (1) Multi-scale characterand word-level indexing terms have been integrated with the syllable-level information. (2) Information cues from the contemporary newswire text corpus have been used to create more accurate syllable indexing terms. (3) Automatic document expansion, blind relevance feedback, and query expansion via the term association matrix have been applied in retrieval. With all these schemes, the average precision can be improved from 55.46% to 71.29%.

[1]  Martin Wechsler,et al.  Spoken document retrieval based on phoneme recognition , 1998 .

[2]  Lin-Shan Lee,et al.  Retrieval of broadcast news speech in Mandarin Chinese collected in Taiwan using syllable-level statistical characteristics , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Karen Spärck Jones,et al.  Experiments in Spoken Document Retrieval , 1996, Inf. Process. Manag..

[4]  Karen Spärck Jones,et al.  Effects of out of vocabulary words in spoken document retrieval (poster session) , 2000, SIGIR '00.

[5]  Kenney Ng Information fusion for spoken document retrieval , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Takenobu Tokunaga,et al.  Query expansion using heterogeneous thesauri , 2000, Inf. Process. Manag..

[7]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.