论文信息 - A subspace based progressive coding method for speech compression

A subspace based progressive coding method for speech compression

Abstract In this study, two novel methods, which are based on Karhunen Loeve Transform (KLT) and Independent Component Analysis (ICA), are proposed for coding of speech signals. Instead of immediately dealing with eigenvalue magnitudes, the KLT- and ICA-based methods use eigenvectors of covariance matrices (or independent components for ICA) by geometrically grouping these vectors into fewer numbers of vectors. In this way, a data representation compaction is achieved. Further compression is achieved through discarding autocovariance eigenvectors corresponding to the small eigenvalues and applying vector quantization on the remaining eigenvectors. Additionally, this study proposes an iterative error refinement process, which uses the rest of the available bandwidth in order to transmit an efficient representation of the description error for better SNR. The overall process constitutes a new approach to efficient speech coding, with ICA being used in subspace speech coding for the first time. Constant bit rate (CBR) and variable bit rate (VBR) coding algorithms are employed with the proposed methods. TIMIT speech database is used in the experimental studies. Speech signals are synthesized at 2.4 kbps, 8 kbps, 12.2 kbps, 16 kbps, 16.4kbps and 19.85 kbps rates by using various frame lengths. The qualities of synthesized speech signals are compared to those of available speech codecs, i.e., LPC (2.4 kbps), G.728 (LD-CELP, 16 kbps), G.729A (CS-CELP, 8 kbps), EVS (16.4 kbps), AMR-NB (12.2 kbps) and AMR-WB (19.85 kbps).

Erol Seke | Ömer Nezih Gerek | M. Bilginer Gülmezoglu | Serkan Keser

[1] John S. Collura,et al. MELP: the new Federal Standard at 2400 bps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Victor Zue,et al. Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[3] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.

[4] Yi Hu,et al. Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Li Deng. Speech processing , 2003 .

[6] Moo Young Kim,et al. KLT-based adaptive entropy-constrained quantization with universal arithmetic coding , 2010, IEEE Transactions on Consumer Electronics.

[7] Prashant Kumar,et al. Performance Evaluation of DFT-Spread OFDM and DCT-Spread OFDM for Underwater Acoustic Communication , 2012, 2012 IEEE Vehicular Technology Conference (VTC Fall).

[8] N. Ahmed,et al. Discrete Cosine Transform , 1996 .

[9] Yen-Chun Lin,et al. A Low-Delay CELP Coder for the CCITT 16 kb/s Speech Coding Standard , 1992, IEEE J. Sel. Areas Commun..

[10] Seongjoo Lee,et al. Complexity reduction in Karhunen-Loeve transform based speech coder for voice transmission , 2014, IEEE Transactions on Consumer Electronics.

[11] Vivek K. Goyal,et al. Theoretical foundations of transform coding , 2001, IEEE Signal Process. Mag..

[12] P. Krishnamoorthy. An Overview of Subjective and Objective Quality Measures for Noisy Speech Enhancement Algorithms , 2011 .

[13] Bin Wang,et al. Packet-loss concealment technology advances in EVS , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14] Venkatesh Krishnan,et al. Improved error resilience for volte and VoIP with 3GPP EVS channel aware coding , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] P. Comon. Independent Component Analysis , 1992 .

[16] A. Vasuki,et al. A review of vector quantization techniques , 2006, IEEE Potentials.

[17] Touradj Ebrahimi,et al. The JPEG 2000 still image compression standard , 2001, IEEE Signal Process. Mag..

[18] W. Bastiaan Kleijn,et al. KLT-based adaptive classified VQ of the speech signal , 2004, IEEE Transactions on Speech and Audio Processing.

[19] J.D. Gibson,et al. Speech coding methods, standards, and applications , 2005, IEEE Circuits and Systems Magazine.

[20] Sandeep Kumar,et al. A new pitch detection scheme based on ACF and AMDF , 2014, 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies.

[21] I. A. Gerson,et al. Vector sum excited linear prediction (VSELP) speech coding at 8 kbps , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[22] Marc Antonini,et al. Low-complexity wideband LSF quantization by predictive KLT coding and generalized Gaussian modeling , 2006, 2006 14th European Signal Processing Conference.

[23] W. Bastiaan Kleijn,et al. Asymptotically Optimal Model Estimation for Quantization , 2011, IEEE Transactions on Communications.

[24] Anil K. Jain. Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[25] Mário A. T. Figueiredo,et al. Class-adapted image compression using independent component analysis , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[26] Antti Toskala,et al. LTE for UMTS: Evolution to LTE-Advanced , 2011 .