An Effective Approach for Vocal Melody Extraction from Polyphonic Music on GPU

Melody extraction from polyphonic music is a valuable but difficult problem in music information retrieval. The extraction incurs a large computational cost that limits its application. Growing processing cores and increased bandwidth have made GPU an ideal candidate for the development of fine-grained parallel algorithms. In this paper, we present a parallel approach for salience-based melody extraction from polyphonic music using CUDA. For 21 seconds of polyphonic clip, the extraction time is cut from 3 seconds to 33 milliseconds using NVIDIA GeForce GTX 480 which is up to 100 times faster. The increased performance allows the melody extraction to be carried out for real-time applications. Furthermore, the evaluation of the extraction on huge datasets is also possible. We give insight into how such significant speed gains are made and encourage the development and adoption of GPU in music information retrieval field.

[1]  Holger Blume,et al.  GPU-based acoustic feature extraction for electronic media processing , 2011, 2011 14th ITG Conference on Electronic Media Technology.

[2]  David A. Bader,et al.  GPU merge path: a GPU merging algorithm , 2012, ICS '12.

[3]  G. Yao,et al.  Efficient Vocal Melody Extraction from Polyphonic Music Signals , 2013 .

[4]  George Tzanetakis,et al.  A comparative evaluation of search techniques for query-by-humming using the MUSART testbed , 2007 .

[5]  Masataka Goto,et al.  A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals , 2004, Speech Commun..

[6]  Ian Buck,et al.  Fast Parallel Expectation Maximization for Gaussian Mixture Models on GPUs Using CUDA , 2009, 2009 11th IEEE International Conference on High Performance Computing and Communications.

[7]  Graham E. Poliner,et al.  Melody Transcription From Music Audio: Approaches and Evaluation , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Rémi Gribonval,et al.  Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  J. Stephen Downie,et al.  The music information retrieval evaluation exchange (2005-2007): A window into music information retrieval research , 2008 .

[10]  George Tzanetakis,et al.  Distributed Audio Feature Extraction for Music , 2005, ISMIR.

[11]  Jyh-Shing Roger Jang,et al.  On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  David Wessel,et al.  Accelerating Non-Negative Matrix Factorization for Audio Source Separation on Multi-Core and Many-Core Architectures , 2009, ISMIR.

[13]  Kurt Keutzer,et al.  Fast support vector machine training and classification on graphics processors , 2008, ICML '08.

[14]  Yi Yang,et al.  Fixing Performance Bugs: An Empirical Study of Open-Source GPGPU Programs , 2012, 2012 41st International Conference on Parallel Processing.

[15]  Anssi Klapuri,et al.  Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes , 2006, ISMIR.

[16]  Youngmoo E. Kim,et al.  Efficient Acoustic Feature Extraction for Music Information Retrieval Using Programmable Gate Arrays , 2009, ISMIR.