Improving Speech Recognizer Using Neuro-genetic Weights Connection Strategy for Spoken Query Information Retrieval

This paper describes the integration of speech recognizer into information retrieval (IR) system to retrieve text documents relevant to the given spoken queries. Our aim is to improve the speech recognizer since it has been proven as crucial for the front end of a Spoken Query IR system. When speech is used as the source material for indexing and retrieval, the effect of transcriber error on retrieval performance effectiveness must be considered. Thus, we proposed a dynamic weights connection strategy of artificial intelligence (AI) learning algorithms that combined genetic algorithms (GA) and neural network (NN) methods to improve the speech recognizer. Both algorithms are separate modules and were used to find the optimum weights for the hidden and output layers of a feed-forward artificial neural network (ANN) model. A mutated GA technique was proposed and compared with the standard GA technique. One hundred experiments using 50 selected words from spontaneous speeches were conducted. For evaluating speech recognition performance, we used the standard word error rate (WER) and for evaluating retrieval performance, we utilized precision and recall with respect to manual transcriptions. The proposed method yielded 95.39% recognition performance of spoken query input reducing the error rate to 4.61%. As for retrieval performance, our mutated GA+ANN model achieved a commendable 91% precision rate and 83% recall rate. It is interesting to note that the degradation in precision-recall is the same as the degradation in recognition performance of speech recognition engine. Owing to this fact, GA combined with ANN proved to attain certain advantages with sufficient accuracy.

[1]  Ellen M. Voorhees,et al.  The TREC Spoken Document Retrieval Track: A Success Story , 2000, TREC.

[2]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[3]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[4]  Nordin Abu Bakar,et al.  An evaluation of endpoint detection measures for malay speech recognition of an isolated words , 2010, 2010 International Symposium on Information Technology.

[5]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[6]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[7]  Simon Dobrisek,et al.  A voice-driven web browser for blind people , 2003, INTERSPEECH.

[8]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[9]  Fabio Crestani,et al.  Word Recognition Errors and Relevance Feedback in Spoken Query Processing , 2000, FQAS.

[10]  Mona Singh,et al.  Experiments in spoken queries for document retrieval , 1997, EUROSPEECH.

[11]  Constantinos S. Pattichis,et al.  Classification capacity of a modular neural network implementing neurally inspired architecture and training rules , 2004, IEEE Transactions on Neural Networks.

[12]  Adil M. Bagirov,et al.  Hybridization of Neural Learning Algorithms Using Evolutionary and Discrete Gradient Approaches , 2005 .

[13]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[14]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[15]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[16]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[17]  Karen Spärck Jones,et al.  TREC-6 1997 Spoken Document Retrieval Track Overview and Results , 1997, TREC.

[18]  Valentín Cardeñoso-Payo,et al.  Development and evaluation of a spoken dialog system to access a newspaper web site , 2005, INTERSPEECH.