Source and channel coding for speech transmission and remote speech recognition

This dissertation addresses the issue of designing source and channel coding techniques for two types of speech processing applications: speech transmission and remote speech recognition. In the first part, adaptive multi-rate (AMR) speech transmission systems that switch between operating modes depending on channel conditions are presented. We address the design of such an adaptive scheme using variable bit rate embedded source encoders and rate-compatible channel coders providing unequal error protection. A novel technique, the rate-compatible punctured trellis code (RCPT) for obtaining unequal error protection via progressive puncturing of symbols in a trellis, is presented and compared with the rate-compatible punctured convolutional code with and without bit-interleaved coded modulation. The perceptually-based speech coder proposed displays a wide range of bit error sensitivities, and is used in combination with rate-compatible punctured channel codes providing adequate levels of protection. The resulting system operates over a wide range of channel conditions with graceful performance degradation as the channel signal-to-noise ratio decreases. In the second part, we present a framework for developing source coding, channel coding, channel decoding, and frame erasure concealment techniques adapted for remote speech recognition applications. It is shown that speech recognition, as opposed to speech coding, is more sensitive to channel errors than channel erasures. Appropriate channel coding design criteria are determined. For channel decoding, we introduce a novel technique for combining soft decision decoding with error detection. The technique outperforms the often used hard decision strategy. In addition, frame erasure concealment techniques are used at the decoder to deal with unreliable frames. At the recognition stage, we present a technique to modify the recognition engine to take into account the time-varying reliability of the decoded feature after channel transmission. The resulting engine, referred to as weighted Viterbi recognition (WVR), further improves recognition accuracy. Together, source coding, channel coding and the modified recognition engine are shown to provide good recognition accuracy over a wide range of communication channels at very low bit rates.

[1]  Juan Carlos De Martin,et al.  An adaptive multi-rate speech coder for digital cellular telephony , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  John B. Anderson,et al.  Decision depths of convolutional codes , 1989, IEEE Trans. Inf. Theory.

[3]  Stephen G. Wilson,et al.  Digital Modulation and Coding , 1995 .

[4]  Abeer Alwan,et al.  Modeling auditory perception for robust speech recognition , 1998 .

[5]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[6]  Yair Shoham,et al.  New directions in subband coding , 1988, IEEE J. Sel. Areas Commun..

[7]  David L. Thomson,et al.  Use of periodicity and jitter as speech recognition features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[8]  Ephraim Zehavi,et al.  8-PSK trellis codes for a Rayleigh channel , 1992, IEEE Trans. Commun..

[9]  Andreas Spanias,et al.  A review of algorithms for perceptual coding of digital audio signals , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[10]  N. Rydbeck,et al.  Analysis of Digital Errors in Nonlinear PCM Systems , 1976, IEEE Trans. Commun..

[11]  Abeer Alwan,et al.  Joint channel decoding - Viterbi recognition for wireless applications , 2001, INTERSPEECH.

[12]  E. Gilbert Capacity of a burst-noise channel , 1960 .

[13]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[14]  Richard D. Wesel,et al.  Trellis code design for correlated fading and achievable rates for tomlinson-harashima precoding , 1996 .

[15]  V. Hardman,et al.  A survey of packet loss recovery techniques for streaming audio , 1998, IEEE Network.

[16]  Gregory L. Zick,et al.  Speech recognition on MPEG/Audio encoded files , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[17]  Allen Gersho,et al.  An overview of variable rate speech coding for cellular networks , 1992, 1992 IEEE International Conference on Selected Topics in Wireless Communications.

[18]  Ben P. Milner,et al.  Robust speech recognition over IP networks , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[19]  Alexandros Potamianos,et al.  Soft-feature decoding for speech recognition over wireless channels , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[20]  Abeer Alwan,et al.  Speech transmission using rate-compatible trellis codes and embedded source coding , 2002, IEEE Trans. Commun..

[21]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[22]  Richard D. Wesel,et al.  Trellis codes for periodic erasures , 2000, IEEE Trans. Commun..

[23]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[24]  Gerhard Stoll,et al.  ISO-MPEG-1 Audio: A Generic Standard for Coding of High-: Quality Digital Audio , 1994 .

[25]  Chafic Mokbel,et al.  Towards improving ASR robustness for PSN and GSM telephone applications , 1997, Speech Commun..

[26]  K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1990 .

[27]  J. Makhoul,et al.  Quantization properties of transmission parameters in linear predictive systems , 1975 .

[28]  Stephan Euler,et al.  The influence of speech coding algorithms on automatic speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Giuseppe Caire,et al.  Bit-Interleaved Coded Modulation , 2008, Found. Trends Commun. Inf. Theory.

[30]  Aaron D. Wyner,et al.  Coding Theorems for a Discrete Source With a Fidelity CriterionInstitute of Radio Engineers, International Convention Record, vol. 7, 1959. , 1993 .

[31]  Kuldip K. Paliwal,et al.  Interpolation properties of linear prediction parametric representations , 1995, EUROSPEECH.

[32]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[33]  Kuldip K. Paliwal,et al.  Effect of speech coders on speech recognition performance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[34]  Mark Hasegawa-Johnson,et al.  PLP coefficients can be quantized at 400 bps , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[35]  Richard D. Wesel,et al.  Analytic Techniques for Periodic Trellis Codes , 1998 .

[36]  Vassilios Digalakis,et al.  Robust speech recognition for multiple topological scenarios of the GSM mobile phone system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[37]  Shu Lin,et al.  Error control coding : fundamentals and applications , 1983 .

[38]  Abeer Alwan,et al.  Speech production and perception models and their applications to synthesis, recognition, and coding , 1995, Proceedings of ISSE'95 - International Symposium on Signals, Systems and Electronics.

[39]  B. Milner Robust speech recognition in burst-like packet loss , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[40]  Abeer Alwan,et al.  Embedded joint source-channel coding of speech using symbol puncturing of trellis codes , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[41]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[42]  van Wj Wil Gils,et al.  On linear unequal error protection codes , 1982 .

[43]  Thomas P. Barnwell,et al.  Objective measures for speech quality testing , 1978 .

[44]  Bhaskar D. Rao,et al.  Theoretical analysis of the high-rate vector quantization of LPC parameters , 1995, IEEE Trans. Speech Audio Process..

[45]  Deepen Sinha,et al.  Unequal error protection methods for perceptual audio coders , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[46]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[47]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[48]  Kari Järvinen,et al.  GSM EFR based multi-rate codec family , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[49]  Richard M. Stern,et al.  Speech recognition from GSM codec parameters , 1998, ICSLP.

[50]  Ponani S. Gopalakrishnan,et al.  Compression of acoustic features for speech recognition in network environments , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[51]  Wilfrid LeBlanc,et al.  An enhanced full rate speech coder for digital cellular applications , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[52]  Chafic Mokbel,et al.  Solutions for robust recognition over the GSM cellular network , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[53]  Francisco J. Valverde-Albacete,et al.  Avoiding distortions due to speech coding and transmission errors in GSM ASR tasks , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[54]  D. J. Goodman,et al.  Combined source and channel coding for variable-bit-rate speech transmission , 1983, The Bell System Technical Journal.

[55]  S. Shlien,et al.  Guide to MPEG-1 audio standard , 1994, IEEE Trans. Broadcast..

[56]  Abeer Alwan,et al.  Source and channel coding for remote speech recognition over error-prone channels , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[57]  Allen Gersho,et al.  Advances in speech and audio compression , 1994, Proc. IEEE.

[58]  Joachim Hagenauer,et al.  A Viterbi algorithm with soft-decision outputs and its applications , 1989, IEEE Global Telecommunications Conference, 1989, and Exhibition. 'Communications Technology for the 1990s and Beyond.

[59]  Bishnu S. Atal,et al.  Efficient coding of LPC parameters by temporal decomposition , 1983, ICASSP.

[60]  Hong Kook Kim,et al.  Bitstream-based feature extraction for wireless speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[61]  D. J. Goodman,et al.  Transmission errors and forward error correction in embedded Differential Pulse Code Modulation , 1983, The Bell System Technical Journal.

[62]  Kannan Ramchandran,et al.  Robust image transmission over energy-constrained time-varying channels using multiresolution joint source-channel coding , 1998, IEEE Trans. Signal Process..

[63]  Hyung Soon Kim,et al.  Formant weighted cepstral feature for LSP-based speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[64]  Kazunori Ozawa,et al.  An adaptive multi-rate speech codec based on MP-CELP coding algorithm for ETSI AMR standard , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[65]  Vassilios Digalakis,et al.  Efficient speech recognition using subvector quantization and discrete-mixture HMMS , 2000, Comput. Speech Lang..

[66]  Richard V. Cox,et al.  Spectral quantization and interpolation for CELP coders , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[67]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[68]  Stephen G. Wilson,et al.  Multifrequency trellis coding with low delay for fading channels , 1993, IEEE Trans. Commun..

[69]  Hwang Soo Lee,et al.  Speech recognition using quantized LSP parameters and their transformations in digital communication , 2000, Speech Commun..

[70]  C.-E. Sundberg The Effect of Single Bit Errors in Standard Nonlinear PCM Systems , 1976, IEEE Trans. Commun..

[71]  S. Neely From Sound to Synapse: Physiology of the Mammalian Ear , 1998 .

[72]  Vassilios Digalakis,et al.  Quantization of cepstral parameters for speech recognition over the World Wide Web , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[73]  Thomas P. Barnwell,et al.  A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[74]  Nikil Jayant,et al.  Signal Compression: Technology Targets and Research Directions , 1992, IEEE J. Sel. Areas Commun..

[75]  Philip Lockwood,et al.  Evaluation of root-normalised front-end (RN LFCC) for speech recognition in wireless GSM network environments , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[76]  Allen Gersho,et al.  Encoding of LPC spectral parameters using switched-adaptive interframe vector prediction (speech coding) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[77]  J. Allen,et al.  Harvey Fletcher's role in the creation of communication acoustics. , 1996, The Journal of the Acoustical Society of America.

[78]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[79]  Wei Shi,et al.  Periodic symbol puncturing of trellis codes , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[80]  Xueting Liu,et al.  Channel adaptive joint source-channel coding of speech , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[81]  Abeer Alwan,et al.  Towards efficient and scalable speech compression schemes for robust speech recognition applications , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[82]  D. Graupe,et al.  Punctured Convolutional Codes of Rate (n - 1)/n and Simplified Maximum Likelihood Decoding , 1979 .

[83]  P. Marcie,et al.  Speech analysis, synthesis and perception: J. L. Flanagan. Springer-Verlag, Berlin, 1965. I vol, VIII + 317 pp., index auteurs et index matières. $14.50. , 1967 .

[84]  Robert J. Safranek,et al.  Signal compression based on models of human perception , 1993, Proc. IEEE.

[85]  L. H. Charles Lee,et al.  New rate-compatible punctured convolutional codes for Viterbi decoding , 1994, IEEE Trans. Commun..

[86]  Abeer Alwan,et al.  A Perceptually Based Embedded , 1997 .

[87]  Carl-Erik W. Sundberg,et al.  The performance of rate-compatible punctured convolutional codes for digital mobile radio , 1990, IEEE Trans. Commun..

[88]  Hong Kook Kim,et al.  Feature enhancement for a bitstream-based front-end in wireless speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[89]  A.N. Willson,et al.  High-performance IIR QMF banks for speech subband coding , 1994, Proceedings of IEEE International Symposium on Circuits and Systems - ISCAS '94.

[90]  Yasuo Hirata,et al.  High-Rate Punctured Convolutional Codes for Soft Decision Viterbi Decoding , 1984, IEEE Trans. Commun..

[91]  Biing-Hwang Juang,et al.  Line spectrum pair (LSP) and speech data compression , 1984, ICASSP.

[92]  Abeer Alwan,et al.  An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[93]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[94]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[95]  Tor A. Ramstad,et al.  Sub-band coder with a simple adaptive bit-allocation algorithm a possible candidate for digital mobile telephony? , 1982, ICASSP.

[96]  Carl-Erik W. Sundberg,et al.  Subband speech coding and matched convolutional channel coding for mobile radio channels , 1991, IEEE Trans. Signal Process..

[97]  Dariush Divsalar,et al.  The design of trellis coded MPSK for fading channels: performance criteria , 1988, IEEE Trans. Commun..

[98]  Gottfried Ungerboeck,et al.  Channel coding with multilevel/phase signals , 1982, IEEE Trans. Inf. Theory.

[99]  A. Uvliden,et al.  Adaptive multi-rate. A speech service adapted to cellular radio network quality , 1998, Conference Record of Thirty-Second Asilomar Conference on Signals, Systems and Computers (Cat. No.98CH36284).

[100]  Joachim Hagenauer,et al.  Source-controlled channel decoding , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[101]  Richard D. Wesel,et al.  Minimality for punctured convolutional codes , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[102]  H. Fletcher Loudness, Masking and Their Relation to the Hearing Process and the Problem of Noise Measurement , 1938 .

[103]  Seung Ho Choi,et al.  Speech recognition method using quantised LSP parameters in CELP-type coders , 1998 .