Harmonic and percussive source separation using a convolutional auto encoder

Real world audio signals are generally a mixture of harmonic and percussive sounds. In this paper, we present a novel method for separating the harmonic and percussive audio signals from an audio mixture. Proposed method involves the use of a convolutional auto-encoder on a magnitude of the spectrogram to separate the harmonic and percussive signals. This network structure enables automatic high-level feature learning and spectral domain audio decomposition. An evaluation was performed using professionally produced music recording. Consequently, we confirm that the proposed method provides superior separation performance compared to conventional methods.

[1]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[2]  Daniel P. W. Ellis,et al.  Signal Processing for Music Analysis , 2011, IEEE Journal of Selected Topics in Signal Processing.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Honglak Lee,et al.  Unsupervised feature learning for audio classification using convolutional deep belief networks , 2009, NIPS.

[5]  Βασίλης Κατσούρος,et al.  Deploying Nonlinear Image Filters to Spectrogram for Harmonic/Percussive Separation , 2012 .

[6]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[7]  Antoine Liutkus,et al.  Kernel Additive Models for Source Separation , 2014, IEEE Transactions on Signal Processing.

[8]  Mark D. Plumbley,et al.  Phase-based harmonic/percussive separation , 2014, INTERSPEECH.

[9]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[10]  Hirokazu Kameoka,et al.  A Real-time Equalizer of Harmonic and Percussive Components in Music Signals , 2008, ISMIR.

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[14]  Yann LeCun,et al.  Moving Beyond Feature Design: Deep Architectures and Automatic Feature Learning in Music Informatics , 2012, ISMIR.

[15]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[16]  L. Daudet,et al.  Harmonic/percussive separation using Kernel Additive Modelling , 2014 .

[17]  Derry Fitzgerald,et al.  Harmonic/Percussive Separation Using Median Filtering , 2010 .

[18]  Hirokazu Kameoka,et al.  Separation of a monaural audio signal into harmonic/percussive components by complementary diffusion on spectrogram , 2008, 2008 16th European Signal Processing Conference.

[19]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[20]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).