Adaptation to Pronunciation Variations in Indonesian Spoken Query-Based Information Retrieval

Recognition errors of proper nouns and foreign words significantly decrease the performance of ASR-based speech applications such as voice dialing systems, speech summarization, spoken document retrieval, and spoken query-based information retrieval (IR). The reason is that proper nouns and words that come from other languages are usually the most important key words. The loss of such words due to misrecognition in turn leads to a loss of significant information from the speech source. This paper focuses on how to improve the performance of Indonesian ASR by alleviating the problem of pronunciation variation of proper nouns and foreign words (English words in particular). To improve the proper noun recognition accuracy, proper-noun specific acoustic models are created by supervised adaptation using maximum likelihood linear regression (MLLR). To improve English word recognition, the pronunciation of English words contained in the lexicon is fixed by using rule-based English-to-Indonesian phoneme mapping. The effectiveness of the proposed method was confirmed through spoken query based Indonesian IR. We used Inference Network-based (IN-based) IR and compared its results with those of the classical Vector Space Model (VSM) IR, both using a tf-idf weighting schema. Experimental results show that IN-based IR outperforms VSM IR.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Elizabeth C. Botha,et al.  Cross-language use of acoustic information for automatic speech recognition , 2002, Speech Commun..

[3]  Chao Huang,et al.  Accent modeling based on pronunciation dictionary adaptation for large vocabulary Mandarin speech recognition , 2000, INTERSPEECH.

[4]  Joseph Picone,et al.  Improved surname pronunciations using decision trees , 1998, ICSLP.

[5]  Joseph Picone,et al.  Automated generation of N-best pronunciations of proper nouns , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Katunobu Itou,et al.  A Method for Open-Vocabulary Speech-Driven Text Retrieval , 2002, EMNLP.

[7]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[8]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[9]  S. Sakti,et al.  Rapid Development of Initial Indonesian Phoneme-based Speech Recognition Using The Cross-Language Approach , 2005 .

[10]  Elmar Nöth,et al.  Acoustic modeling of foreign words in a German speech recognition system , 2001, INTERSPEECH.

[11]  M. de Rijke,et al.  The impact of stemming on information retrieval in Bahasa Indonesia , 2003 .

[12]  Joseph Picone,et al.  An advanced system to generate pronunciations of proper nouns , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Dessi Puji Lestari,et al.  A Large Vocabulary Continuous Speech Recognition System for Indonesian Language , 2006 .

[14]  Mona Singh,et al.  Experiments in spoken queries for document retrieval , 1997, EUROSPEECH.

[15]  Fabio Crestani,et al.  Spoken query processing for interactive information retrieval , 2002, Data Knowl. Eng..

[16]  Katunobu Itou,et al.  Speech-Driven Text Retrieval: Using Target IR Collections for Statistical Language Model Adaptation in Speech Recognition , 2001, SIGIR Workshop: Information Retrieval Techniques for Speech Applications.

[17]  Hugh E. Williams,et al.  A Testbed for Indonesian Text Retrieval , 2004, ADCS.

[18]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[19]  Robert Eklund,et al.  Pronunciation in an internationalized society: a multi-dimensional problem considered , 1996 .