Utilizing Long Distance Word Dependencies for Automatic Speech Recognition

Statistical language models have been widely used in natural language processing (NLP) applications. N-gram has long been proven as a useful words representation technique for language models. However, n-gram assumes that the probability of any word in a sequence of words depends only on the previous n-1 consecutive words. Therefore, investigating the performance of long distance dependencies (LDDs) is an important research area to consider the words' relationships beyond n-1 proceeding words. LDDs aims at finding the words co-occurrences while relaxing the consecutive constraint through a wider window rather than two or three previous words. That is, LDDs are a set of association rules that go beyond the scope of n-gram. One possible use of LDDs is for N-best hypotheses rescoring in automatic speech recognition (ASR) systems. In this paper, we used a textual part of a speech corpus that contains 6,145 short sentences. The experimental results show that the predictive Apriori data-mining algorithm is a suitable candidate to generate the frequently appeared LDDs that also contains consecutive and nonconsecutive words' relationships. The study also reveals that extracting LDDs is a computation expensive task that requires high performance computing (HPC) environment.

[1]  S. V. K. Kumar,et al.  A Survey: On Association Rule Mining , 2013 .

[2]  Zhou Guodong,et al.  Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition , 1999 .

[3]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[4]  Husni Al-Muhtaseb,et al.  Arabic broadcast news transcription system , 2007, Int. J. Speech Technol..

[5]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[6]  Wasfi G. Al-Khatib,et al.  Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach , 2011, International Journal of Speech Technology.

[7]  Jianfeng Gao,et al.  Long Distance Dependency in Language Modeling: An Empirical Study , 2004, IJCNLP.

[8]  Guodong Zhou,et al.  Interpolation of n-gram and mutual-information based trigger pair language models for Mandarin speech recognition , 1999, Comput. Speech Lang..

[9]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[10]  Tobias Scheffer Finding association rules that trade support optimally against confidence , 2005 .

[11]  Mahmoud Hasan Alsaheb,et al.  Capturing the Common Syntactical Rules for the Holy Quran: A Data Mining Approach , 2013, 2013 Taibah University International Conference on Advances in Information Technology for the Holy Quran and Its Sciences.

[12]  Wasfi G. Al-Khatib,et al.  Toward enhanced Arabic speech recognition using part of speech tagging , 2011, Int. J. Speech Technol..

[13]  Jianfeng Gao,et al.  The Use of Clustering Techniques for Language Modeling V Application to Asian Language , 2001, ROCLING/IJCLCLP.

[14]  Yueming Lu,et al.  Word Activation Forces-Based Language Modeling and Smoothing , 2013, 2013 5th International Conference on Intelligent Human-Machine Systems and Cybernetics.

[15]  Fawaz S. Al-Anzi,et al.  The impact of phonological rules on Arabic speech recognition , 2017, Int. J. Speech Technol..

[16]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[17]  Yi-Ping Phoebe Chen,et al.  Association rule mining to detect factors which contribute to heart disease in males and females , 2013, Expert Syst. Appl..

[18]  Shujian Huang,et al.  Learning word embeddings from dependency relations , 2014, 2014 International Conference on Asian Language Processing (IALP).

[19]  Wasfi G. Al-Khatib,et al.  Cross-word Arabic pronunciation variation modeling for speech recognition , 2011, Int. J. Speech Technol..

[20]  Chun Hu,et al.  Research on language model of long-distance dependency , 2010, 2010 International Conference on Advances in Energy Engineering.

[21]  Wen Wang,et al.  Rescoring effectiveness of language models using different levels of knowledge and their integration , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[23]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[24]  Richard M. Stern,et al.  N-Best List Rescoring Using Syntactic Trigrams , 2004, MICAI.

[25]  Xingang Yu,et al.  Graph-Based Language Model of Long-Distance Dependency , 2011, 2011 International Conference on Asian Language Processing.