Acoustic backing-off as an implementation of missing feature theory

Abstract In this paper, we discuss acoustic backing-off as a method to improve automatic speech recognition robustness. Acoustic backing-off aims to achieve the same objective as the marginalization approach of missing feature theory: the detrimental influence of outlier values is effectively removed from the local distance computation in the Viterbi algorithm. The proposed method is based on one of the principles of robust statistical pattern matching: during recognition the local distance function (LDF) is modeled using a mixture of the distribution observed during training and a distribution describing observations not previously seen. In order to assess the effectiveness of the new method, we used artificial distortions of the acoustic vectors in connected digit recognition over telephone lines. We found that acoustic backing-off is capable of restoring recognition performance almost to the level observed for the undisturbed features, even in cases where a conventional LDF completely fails. These results show that recognition robustness can be improved using a marginalization approach, where making the distinction between reliable and corrupted feature values is wired into the recognition process. In addition, the results show that application of acoustic backing-off is not limited to feature representations based on filter bank outputs. Finally, the results indicate that acoustic backing-off is much less effective when local distortions are smeared over all vector elements. Therefore, the acoustic pre-processing steps should be chosen with care, so that the dispersion of distortions over all acoustic vector elements as a result of within-vector feature transformations is minimal.

[1]  Hynek Hermansky,et al.  Sub-band based recognition of noisy speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Climent Nadeu,et al.  On the decorrelation of filter-bank energies in speech recognition , 1995, EUROSPEECH.

[3]  T. Martin,et al.  On the effects of varying filter bank parameters on isolated word recognition , 1982 .

[4]  Richard Lippmann,et al.  Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering and noise KN-37 , 1997, EUROSPEECH.

[5]  Lou Boves,et al.  Channel normalization techniques for automatic speech recognition over the telephone , 1998, Speech Commun..

[6]  Lou Boves,et al.  Acoustic backing-off in the local distance computation for robust automatic speech recognition , 1998, ICSLP.

[7]  Hervé Bourlard,et al.  Robust Speech Recognition based on Multi-Stream Features , 1997 .

[8]  Bert Cranen,et al.  MISSING FEATURE THEORY IN ASR: MAKE SURE YOU MISS THE RIGHT TYPE OF FEATURES , 1999 .

[9]  Phil D. Green,et al.  Some solution to the missing feature problem in data classification, with application to noise robust ASR , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Lawrence R. Rabiner,et al.  Mathematical foundations of hidden Markov models , 1988 .

[11]  Alexandros Potamianos,et al.  Multi-band speech recognition in noisy environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Bert Cranen,et al.  Acoustic pre-processing for optimal effectivity of missing feature theory , 1999, EUROSPEECH.

[13]  L. R. Rabiner,et al.  The effects of selected signal processing techniques on the performance of a filter-bank-based isolated word recognizer , 1983, The Bell System Technical Journal.

[14]  Yurij Kharin Robustness in Statistical Pattern Recognition , 1996 .

[15]  Lou Boves,et al.  The Dutch polyphone corpus , 1995, EUROSPEECH.

[16]  Steve Young,et al.  The HTK book , 1995 .

[17]  Heinrich Niemann,et al.  Recent Advances in Speech Understanding and Dialog Systems , 2012, NATO ASI Series.

[18]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.