Efficient Acoustic Feature Extraction for Music Information Retrieval Using Programmable Gate Arrays

Many of the recent advances in music information retrieval from audio signals have been data-driven, i.e., resulting from the analysis of very large data sets. Widespread performance evaluations on common data sets, such as the annual MIREX events, have also been instrumental in advancing the field. These endeavors incur a large computational cost, and could potentially benefit greatly from more rapid calculation of acoustic features. Traditional, clusterbased solutions for large-scale feature extraction are expensive and space- and power-inefficient. Using the massively parallel architecture of the field programmable gate array (FPGA), it is possible to design an application specific chip rivaling the speed of a cluster for large-scale acoustic feature computation at lower cost. Recent advances in development tools, such as the Xilinx Blockset in Simulink, allow rapid prototyping, simulation, and implementation on actual hardware. Such devices also show potential for the implementation of MIR systems on embedded devices such as cell phones and PDAs where hardware acceleration would be an absolute necessity. We present a prototype library for acoustic feature calculation for implementation on Xilinx FPGA hardware. Furthermore, using a genre classification task we compare the performance of simulated hardware features to those computed using standard methods, demonstrating a nearly negligible drop in classification performance with the potential for large reductions in computation time.

[1]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008 .

[2]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[3]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[6]  Elias Pampalk,et al.  Computational Models of Music Similarity and their Application in Music Information Retrieval , 2006 .

[7]  Kaare Brandt Petersen,et al.  Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.

[8]  Beth Logan,et al.  A music similarity function based on signal analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[9]  Kuldip K. Paliwal,et al.  Speaker Verification in Software and Hardware , 2001 .

[10]  Eric A. Brewer,et al.  Hardware speech recognition for user interfaces in low cost, low power devices , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[11]  Andreas F. Ehmann,et al.  Music-to-knowledge (M2K): a prototyping and evaluation environment for music information retrieval research , 2005, SIGIR '05.

[12]  Jean-Julien Aucouturier,et al.  Ten Experiments on the Modeling of Polyphonic Timbre. (Dix Expériences sur la Modélisation du Timbre Polyphonique) , 2006 .

[13]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[14]  Stephen Cox,et al.  Features and classifiers for the automatic classification of musical audio signals , 2004, ISMIR.

[15]  Jhing-Fa Wang,et al.  Chip design of MFCC extraction for speech recognition , 2002, Integr..

[16]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[17]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[18]  Mert Bay,et al.  The Music Information Retrieval Evaluation eXchange: Some Observations and Insights , 2010, Advances in Music Information Retrieval.

[19]  Kris West Novel techniques for audio music classification and search , 2008, ACMMR.