论文信息 - Matrix Quantization and LPC Vocoder Based Linear Predictive for Low-Resource Speech Recognition system

Matrix Quantization and LPC Vocoder Based Linear Predictive for Low-Resource Speech Recognition system

Over the last ten years, there has been significant progress in the use of low-rate speech coders in voice applications for computers, military communications, and civil communications. This advancement has been made possible by the development of new speech coders that can generate high-quality speech at low data rates. The majority of existing coders include spectral representation of speech, speech waveform matching, and ”optimization” of the coder’s performance for human hearing. The goal of this paper is to provide a thorough evaluation of voice coding methods for educational purposes, with a particular emphasis on the algorithms used in low-rate cellular communication standards. The algorithm we developed using a voice-excited LPC vocoder produces clear, low-distortion results. Ordinary LPCs, on the other hand, fall short of vocoders because they can handle signals other than speech, such as music. To improve quality, additional bandwidth is used to reduce the bit rate. To improve the quality, we tried two approaches. The first was to increase the number of bits required to quantize the DCT coefficients. This coefficient would outperform the inverse DCT in closer error rearrangements. The second possibility is to increase the total number of quantized coefficients. As a result, error array rearrangements would be more accurate. The goal is to identify the point at which a method improvement outperforms the previous, better result. Other coding methods become more complex, but this vocoder suffices.

Shakila Basheer | Surbhi Bhatia | Ankit Kumar | Neeraj Varshney | T. Reddy

[1] Oldrich Slavata,et al. Low Bit-Rate Coded Speech Intelligibility Testing in Czech Language Using Parallel Task , 2020 .

[2] P. S. Sathidevi,et al. Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants , 2020 .

[3] Cheng-Yu Yeh,et al. A Search Complexity Improvement of Vector Quantization to Immittance Spectral Frequency Coefficients in AMR-WB Speech Codec , 2016, Symmetry.

[4] Milos Cernak,et al. Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5] Mohamed Debyeche,et al. An efficient low bit-rate compression scheme of acoustic features for distributed speech recognition , 2016, Comput. Electr. Eng..

[6] Joon‐Hyuk Chang,et al. Efficient implementation techniques of an SVM-based speech/music classifier in SMV , 2015, Multimedia Tools and Applications.

[7] Mattias Nilsson,et al. On entropy-constrained vector quantization using gaussian mixture models , 2008, IEEE Transactions on Communications.

[8] Pradeepa Yahampath,et al. Multiple-Description Predictive-Vector Quantization With Applications to Low Bit-Rate Speech Coding Over Networks , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9] Udaya Bhaskar,et al. Low bit-rate voice compression based on frequency domain interpolative techniques , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10] Kuldip K. Paliwal,et al. Scalable distributed speech recognition using Gaussian mixture model-based block quantisation , 2006, Speech Commun..

[11] Bachir Boudraa,et al. Optimized trellis coded vector quantization of LSF parameters, application to the 4.8kbps FS1016 speech coder , 2005, Signal Process..

[12] Milan Jelinek,et al. Signal modification method for variable bit rate wide-band speech coding , 2005, IEEE Transactions on Speech and Audio Processing.

[13] Hideyuki Nomura,et al. Dependency of Distortion on Output Binary Pattern of the Hidden Layer for a Noisy LSP Quantization Neural Network , 2003, IEICE Trans. Inf. Syst..

[14] Jhing-Fa Wang,et al. Chip design of portable speech memopad suitable for persons with visual disabilities , 2002, IEEE Trans. Speech Audio Process..

[15] Tim Fingscheidt,et al. Joint source-channel (de-)coding for mobile communications , 2002, IEEE Trans. Commun..

[16] Juan M. López-Soler,et al. Linear inter-frame dependencies for very low bit-rate speech coding , 2001, Speech Commun..

[17] Richard V. Cox,et al. A very low bit rate speech coder based on a recognition/synthesis paradigm , 2001, IEEE Trans. Speech Audio Process..

[18] Branka Vucetic,et al. Optimum Source Codec Design in Coded Systems and Its Application for Low-Bit-Rate Speech Transmission , 2000 .

[19] Amir K. Khandani,et al. Symbol-based turbo codes , 1999, IEEE Communications Letters.

[20] Deepen Sinha,et al. Speech data compression through sparse coding of innovations , 1994, IEEE Trans. Speech Audio Process..

[21] Malcolm J. Hawksford,et al. Characterization of Communications Systems Using a Speechlike Test Stimulus , 1993 .

[22] M. M. Lara-Barron,et al. Packet-based embedded encoding for transmission of low-bit-rate-encoded speech in packet networks , 1992 .

[23] Andrew Perkis,et al. Joint source and channel trellis coding of line spectrum pair parameters , 1992, Speech Commun..

[24] Carl-Erik W. Sundberg,et al. Subband speech coding and matched convolutional channel coding for mobile radio channels , 1991, IEEE Trans. Signal Process..

[25] Marco Tagliasacchi,et al. SoundStream: An End-to-End Neural Audio Codec , 2022, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26] Sean A. Ramprashad,et al. Sparse Bit-Allocations Based on Partial Ordering Schemes With Application to Speech and Audio Coding , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[27] Nariman Farvardin,et al. Variable-rate finite-state vector quantization and applications to speech and image coding , 1993, IEEE Trans. Speech Audio Process..