Music retiler: Using NMF2D source separation for audio mosaicing

Musaicing (music mosaicing) aims at reconstructing a target music track by superimposing audio samples selected from a collection. This selection is based on their acoustic similarity to the target. The baseline technique to perform this is concatenative synthesis in which the superposition only occurs in time. Non-Negative Matrix Factorization has also been proposed for this task. In this, a target spectrogram is factorized into an activation matrix and a predefined basis matrix which represents the sample collection. The superposition therefore occurs in time and frequency. However, in both methods the samples used for the reconstruction represent isolated sources (such as bees) and remain unchanged during the musaicing (samples need to be pre-pitch-shifted). This reduces the applicability of these methods. We propose here a variation of the musaicing in which the samples used for the reconstruction are obtained by applying a NMF2D separation algorithm to a music collection (such as a collection of Reggae tracks). Using these separated samples, a second NMF2D algorithm is then used to automatically find the best transposition factors to represent the target. We performed an online perceptual experiment of our method which shows that it outperforms the NMF algorithm when the sources are polyphonic and multi-source.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Juan José Burred A framework for music analysis/resynthesis based on matrix factorization , 2014, ICMC.

[3]  F. Pachet,et al.  MUSICAL MOSAICING , 2001 .

[4]  D. Schwarz,et al.  Corpus-Based Concatenative Synthesis , 2007, IEEE Signal Processing Magazine.

[5]  Jordi Janer,et al.  Extending voice-driven synthesis to audio mosaicing , 2008 .

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  Yi-Hsuan Yang,et al.  Automatic conversion of Pop music into chiptunes for 8-bit pixel art , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[9]  Jordi Bonada,et al.  AUGMENTING SOUND MOSAICING WITH DESCRIPTOR-DRIVEN TRANSFORMATION , 2010 .

[10]  Diemo Schwarz,et al.  A SYSTEM FOR DATA-DRIVEN CONCATENATIVE SOUND SYNTHESIS , 2000 .

[11]  Anssi Klapuri,et al.  A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution , 2014, Semantic Audio.

[12]  Diemo Schwarz,et al.  REAL-TIME CORPUS-BASED CONCATENATIVE SYNTHESIS WITH CATART , 2006 .

[13]  Thomas Grill,et al.  CONSTRUCTING AN INVERTIBLE CONSTANT-Q TRANSFORM WITH NONSTATIONARY GABOR FRAMES , 2011 .

[14]  Morten Mørup,et al.  Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation , 2006, ICA.

[15]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[16]  Ryoho Kobayashi Sound Clustering Synthesis Using Spectral Data , 2003, ICMC.

[17]  Victor Lazzarini,et al.  A streaming audio mosaicing vocoder implementation , 2013 .

[18]  Meinard Müller,et al.  Let it Bee - Towards NMF-Inspired Audio Mosaicing , 2015, ISMIR.