论文信息 - Optimized MFCC feature extraction on GPU

Optimized MFCC feature extraction on GPU

In this paper, we update our previous research for Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and describe the optimizations required for improving throughput on the Graphics Processing Units (GPU). We not only demonstrate that the feature extraction process is suitable for GPUs and a substantial reduction in computation time can be obtained by performing feature extraction on these platforms, but also discus about the optimized algorithm. Using one GTX580 GPU our approach is shown to be approximately 97x faster than a sequential CPU implementation, enabling feature extraction to be performed at under 0.01% real-time. This is significantly faster than prior reported results implemented on GPUs, DSPs and FPGAs. Furthermore we demonstrate that multiple MFCC features can be generated for a set of predefined Vocal Tract Length Normalization (VTLN) alpha parameters with little degradation in throughput, along with the optimization for filter bank and reductions.

Ian R. Lane | Weijia Shang | Jike Chong | Haofeng Kou

[1] S. Buthpitiya,et al. A parallel implementation of Viterbi training for acoustic models using graphics processing units , 2012, 2012 Innovative Parallel Computing (InPar).

[2] Tatsuya Kawahara,et al. Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[3] Pietro Laface,et al. Parallel implementation of Artificial Neural Network training for speech recognition , 2010, Pattern Recognit. Lett..

[4] Gerald Friedland,et al. Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing , 2010, 2010 IEEE International Symposium on Multimedia.

[5] Melvyn J. Hunt,et al. Spectral Signal Processing for ASR , 2007 .

[6] Wonyong Sung,et al. Parallel scalability in speech recognition , 2009, IEEE Signal Processing Magazine.

[7] Weijia Shang,et al. Efficient MFCC feature extraction on Graphics Processing Units , 2013 .

[8] Franz Franchetti,et al. Discrete fourier transform on multicore , 2009, IEEE Signal Processing Magazine.

[9] Youngmoo E. Kim,et al. Efficient Acoustic Feature Extraction for Music Information Retrieval Using Programmable Gate Arrays , 2009, ISMIR.

[10] B. Venkataramani,et al. Hardware Implementation of Real-Time Speech Recognition System Using TMS320C6713 DSP , 2011, 2011 24th Internatioal Conference on VLSI Design.

[11] Kevin Skadron,et al. Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).