论文信息 - Time-domain coding of (near) toll quality speech at rates below 16 kb/s (linear prediction, multi-pulse)

Time-domain coding of (near) toll quality speech at rates below 16 kb/s (linear prediction, multi-pulse)

The use of digital techniques for organizing communication networks and for representing signals results in systems which are superior to their analog counterparts in terms of quality and reliability. During the last decade many interesting speech coding algorithms have been proposed, but only the advent of fast and economical VLSI components has made it practical to implement speech coding techniques in real-time systems. In this thesis a study of efficient time domain algorithms for the encoding of speech signals with a (near) telephone quality speech at bit rates below 16 kb/s is presented. A Delayed Decision Coding (DDC) system using adaptive predictive techniques, which removes the correlation in the speech signal, is used as the basic structure. Procedures for determining the excitation sequence within such a DDC structure are investigated. One such procedure, called Multi-Pulse Excitation (MPE) coding Atal and Remde, 1982 , is studied in detail and techniques for improving the performance of this coder are described. In the course of the work a new coding concept called Regular-Pulse Excitation (RPE) coding was developed and it is demonstrated that this technique produces speech with a quality comparable to existing methods such as the MPE coder, but with a much lower complexity. To provide a reference to the parametric approaches (MPE and RPE), we describe the results of simulations with a code book approach for finding the optimal excitation sequences. We demonstrate that with some modifications this code book approach yields at lower bit rates a quality similar to that of parametric approaches, but at a much greater complexity. Quantization procedures for the coder parameters are described, and efficient procedures for encoding the pulse positions of the multi-pulse excitation signal have been developed. Furthermore, different methods for the quantization of the filter coefficients have been investigated. An investigation of the RPE and MPE coders demonstrated that both coders provide (near) toll quality speech at 10 kb/s. Coder transparency can be obtained at 16 kb/s, while with the use of vector quantization techniques an adequate performance is obtained at 6 kb/s. Further, it is demonstrated that both coders can be successfully applied for the encoding of wide-band speech signals at rates below 32 kb/s. An efficient realization of the proposed RPE coding scheme is obtained by mapping the algorithm onto silicon with the use of CORDIC processor elements as basic building blocks. Finally, to provide a suitable environment for the investigation of speech coding algorithms, we describe a convenient interactive software package, which we have developed in the course of the work.

Peter Kroon