Optimized MFCC feature extraction on GPU

In this paper, we update our previous research for Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and describe the optimizations required for improving throughput on the Graphics Processing Units (GPU). We not only demonstrate that the feature extraction process is suitable for GPUs and a substantial reduction in computation time can be obtained by performing feature extraction on these platforms, but also discus about the optimized algorithm. Using one GTX580 GPU our approach is shown to be approximately 97x faster than a sequential CPU implementation, enabling feature extraction to be performed at under 0.01% real-time. This is significantly faster than prior reported results implemented on GPUs, DSPs and FPGAs. Furthermore we demonstrate that multiple MFCC features can be generated for a set of predefined Vocal Tract Length Normalization (VTLN) alpha parameters with little degradation in throughput, along with the optimization for filter bank and reductions.