A packetization and variable bitrate interframe compression scheme for vector quantizer-based distributed speech recognition

We propose a novel packetization and variable bitrate compression scheme for DSR source coding, based on the Group of Pictures concept from video coding. The proposed algorithm simultaneously packetizes and further compresses source coded features using the high interframe correlation of speech, and is compatible with a variety of VQ-based DSR source coders. The algorithm approximates vector quantizers as Markov Chains, and empirically trains the corresponding probability parameters. Feature frames are then compressed as I-frames, P-frames, or B-frames, using Huffman tables. The proposed scheme can perform lossless compression, but is also robust to lossy compression through VQ pruning or frame puncturing. To illustrate its effectiveness, we applied the proposed algorithm to the ETSI DSR source coder. The algorithm provided compression rates of up to 31.60% with negligible recognition accuracy degradation, and rates of up to 71.15% with performance degradation under 1.0%.

[1]  Robert G. Gallager,et al.  Discrete Stochastic Processes , 1995 .

[2]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[3]  Ángel M. Gómez,et al.  Packet loss concealment based on VQ replicas and MMSE estimation applied to distributed speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[5]  Abeer Alwan,et al.  An efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Abeer Alwan,et al.  Low-bitrate distributed speech recognition for packet-based and wireless communication , 2002, IEEE Trans. Speech Audio Process..

[7]  Thomas Eriksson,et al.  Improving predictive vector quantizers in speech coding applications , 1996 .

[8]  José L. Pérez-Córdoba,et al.  Efficient MMSE-based channel error mitigation techniques. Application to distributed speech recognition over wireless channels , 2005, IEEE Transactions on Wireless Communications.

[9]  Abeer Alwan,et al.  Source and channel coding for remote speech recognition over error-prone channels , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Ajay Luthra,et al.  Overview of the H.264/AVC video coding standard , 2003, IEEE Trans. Circuits Syst. Video Technol..