Packet loss concealment based on VQ replicas and MMSE estimation applied to distributed speech recognition

This paper proposes a new packet loss concealment technique based on the inclusion in each packet of a few FEC bits, representing data replicas, combined with a minimum mean square error estimation (MMSE). This technique is developed for an Aurora-2 distributed speech recognition system working over an IP network. In addition to the data representing the transmitted speech frames, each packet includes some FEC bits representing a strongly VQ-quantized version (replicas) of previous and subsequent frames. When a loss burst occurs, the lost frames can be reconstructed from the VQ replicas. In order to mitigate the degradation introduced by the coarse VQ quantization of the replicas, a model-based MMSE estimation is applied. The experimental results show that, under a strongly degraded channel, it is possible to obtain up to 83.31 % of word accuracy with only 4 FEC bits or 88.47 % with 8 FEC bits per packet, when the Aurora mitigation algorithm only obtains 76.98 %.