Online/offline score informed music signal decomposition: application to minus one

In this paper, we propose a score-informed source separation framework based on non-negative matrix factorization (NMF) and dynamic time warping (DTW) that suits for both offline and online systems. The proposed framework is composed of three stages: training, alignment, and separation. In the training stage, the score is encoded as a sequence of individual occurrences and unique combinations of notes denoted as score units. Then, we proposed a NMF-based signal model where the basis functions for each score unit are represented as a weighted combination of spectral patterns for each note and instrument in the score obtained from a trained a priori over-completed dictionary. In the alignment stage, the time-varying gains are estimated at frame level by computing the projection of each score unit basis function over the captured audio signal. Then, under the assumption that only a score unit is active at a time, we propose an online DTW scheme to synchronize the score information with the performance. Finally, in the separation stage, the obtained gains are refined using local low-rank NMF and the separated sources are obtained using a soft-filter strategy. The framework has been evaluated and compared with other state-of-the-art methods for single channel source separation of small ensembles and large orchestra ensembles obtaining reliable results in terms of SDR and SIR. Finally, our method has been evaluated in the specific task of acoustic minus one, and some demos are presented.

[1]  Gaël Richard,et al.  Learning Optimal Features for Polyphonic Audio-to-Score Alignment , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Arshia Cont,et al.  A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Roger B. Dannenberg,et al.  Understanding Features and Distance Functions for Music Sequence Alignment , 2010, ISMIR.

[4]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Gaurav Sharma,et al.  Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications , 2016, IEEE Transactions on Multimedia.

[6]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[7]  Mark D. Plumbley,et al.  Latent Variable Analysis and Signal Separation , 2018, Lecture Notes in Computer Science.

[8]  Mark D. Plumbley,et al.  Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  D. Fitzgerald,et al.  HARMONIC-PERCUSSIVE SOUND SEPARATION USING RHYTHMIC INFORMATION FROM NON-NEGATIVE MATRIX FACTORIZATION IN SINGLE-CHANNEL MUSIC RECORDINGS , 2017 .

[10]  José Ranilla,et al.  Parallel online time warping for real-time audio-to-score alignment in multi-core systems , 2016, The Journal of Supercomputing.

[11]  Richard Heusdens,et al.  A new psychoacoustical masking model for audio coding applications , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Nicola Orio,et al.  Alignment of Monophonic and Polyphonic Music to a Score , 2001, ICMC.

[13]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[14]  Tuomas Virtanen,et al.  Ieee Transactions on Audio, Speech and Language Processing Direction of Arrival Based Spatial Covariance Model for Blind Sound Source Separation , 2022 .

[15]  Daniel P. W. Ellis,et al.  Ground-truth transcriptions of real music from force-aligned MIDI syntheses , 2003, ISMIR.

[16]  Nicolás Ruiz-Reyes,et al.  Monophonic constrained non-negative sparse coding using instrument models for audio separation and transcription of monophonic source-based polyphonic mixtures , 2013, Multimedia Tools and Applications.

[17]  Antoine Liutkus,et al.  Kernel Additive Models for Source Separation , 2014, IEEE Transactions on Signal Processing.

[18]  Jordi Janer,et al.  Score-Informed Source Separation for Multichannel Orchestral Recordings , 2016, J. Electr. Comput. Eng..

[19]  Changchun Bao,et al.  Speech enhancement based on Bayesian decision and spectral amplitude estimation , 2015, EURASIP J. Audio Speech Music. Process..

[20]  Paris Smaragdis,et al.  A neural network alternative to non-negative audio models , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Miller Puckette,et al.  Score Following Using the Sung Voice , 1995, ICMC.

[22]  Paris Smaragdis,et al.  Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Emilia Gómez,et al.  End-to-end Sound Source Separation Conditioned on Instrument Labels , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Heping Ding,et al.  Combining Superdirective Beamforming and Frequency-Domain Blind Source Separation for Highly Reverberant Signals , 2010, EURASIP J. Audio Speech Music. Process..

[25]  Andrzej Cichocki,et al.  Fast Nonnegative Matrix/Tensor Factorization Based on Low-Rank Approximation , 2012, IEEE Transactions on Signal Processing.

[26]  I. Jolliffe Principal Component Analysis , 2002 .

[27]  Meinard Müller,et al.  Estimating note intensities in music recordings , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Sarvapali D. Ramchurn,et al.  Algorithms for Graph-Constrained Coalition Formation in the Real World , 2017, TIST.

[29]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Elías F. Combarro,et al.  Online score-informed source separation in polyphonic mixtures using instrument spectral patterns , 2019 .

[31]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Pedro Vera-Candeas,et al.  Tempo Driven Audio-to-Score Alignment Using Spectral Decomposition and Online Dynamic Time Warping , 2016, ACM Trans. Intell. Syst. Technol..

[34]  Maximo Cobos,et al.  Nonnegative signal factorization with learnt instrument models for sound source separation in close-microphone recordings , 2013, EURASIP J. Adv. Signal Process..

[35]  Emilia Gómez,et al.  Monaural Score-Informed Source Separation for Classical Music Using Convolutional Neural Networks , 2017, ISMIR.

[36]  Nicolás Ruiz-Reyes,et al.  Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription , 2013, Eng. Appl. Artif. Intell..

[37]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[38]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[39]  Arshia Cont,et al.  Coherent time modeling of Semi-Markov models with application to real-time audio-to-score alignment , 2014, 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[40]  Bryan Pardo,et al.  Online Score-Informed Source Separation with Adaptive Instrument Models , 2015 .

[41]  DeLiang Wang,et al.  A New Framework for Supervised Speech Enhancement in the Time Domain , 2018, INTERSPEECH.

[42]  Tuomas Virtanen,et al.  Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[43]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[44]  Birger Kollmeier,et al.  PEMO-Q—A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[45]  Boaz Rafaely,et al.  Loudness stability of binaural sound with spherical harmonic representation of sparse head-related transfer functions , 2019, EURASIP J. Audio Speech Music. Process..

[46]  Emilia Gómez,et al.  Monoaural Audio Source Separation Using Deep Convolutional Neural Networks , 2017, LVA/ICA.

[47]  Mark D. Plumbley,et al.  A comparison of two different methods for score-informed source separation , 2012 .

[48]  Christopher Raphael,et al.  Music score alignment and computer accompaniment , 2006, CACM.

[49]  Wenwu Wang,et al.  Joint blind dereverberation and separation of speech mixtures , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[50]  Nicolás Ruiz-Reyes,et al.  Constrained non-negative matrix factorization for score-informed piano music restoration , 2016, Digit. Signal Process..

[51]  Jonathan Le Roux,et al.  Deep clustering and conventional networks for music separation: Stronger together , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Arshia Cont Realtime Audio to Score Alignment for Polyphonic Music Instruments, using Sparse Non-Negative Constraints and Hierarchical HMMS , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[53]  Zhaoshui He,et al.  Extended SMART Algorithms for Non-negative Matrix Factorization , 2006, ICAISC.

[54]  Mike E. Davies,et al.  IEEE International Conference on Acoustics Speech and Signal Processing , 2008 .

[55]  Ruijiang Li,et al.  Multi-Stage Non-Negative Matrix Factorization for Monaural Singing Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  Hakan Erdogan,et al.  Deep neural networks for single channel source separation , 2013, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57]  Nicolás Ruiz-Reyes,et al.  Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints , 2014, EURASIP J. Audio Speech Music. Process..

[58]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[59]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[60]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[61]  Anssi Klapuri,et al.  Drum Sound Detection in Polyphonic Music with Hidden Markov Models , 2009, EURASIP J. Audio Speech Music. Process..

[62]  Gautham J. Mysore,et al.  Source Separation By Score Synthesis , 2010, ICMC.

[63]  Kyogu Lee,et al.  Exploiting Continuity/Discontinuity of Basis Vectors in Spectrogram Decomposition for Harmonic-Percussive Sound Separation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[64]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[65]  Nicolás Ruiz-Reyes,et al.  An Audio to Score Alignment Framework Using Spectral Factorization and Dynamic Time Warping , 2015, ISMIR.

[66]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[67]  Tapio Lokki,et al.  Anechoic recording system for symphony orchestra , 2008 .

[68]  Roger B. Dannenberg,et al.  Remixing Stereo Music with Score-Informed Source Separation , 2006, ISMIR.

[69]  Tuomas Virtanen,et al.  Multichannel Blind Sound Source Separation Using Spatial Covariance Model With Level and Time Differences and Nonnegative Matrix Factorization , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[70]  George Tzanetakis,et al.  Polyphonic audio matching and alignment for music retrieval , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[71]  Meinard Müller,et al.  Using score-informed constraints for NMF-based source separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[72]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[73]  Mark B. Sandler,et al.  Structured dropout for weak label and multi-instance learning and its application to score-informed source separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[74]  Axel Röbel,et al.  Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[75]  Franck Giron,et al.  Deep neural network based instrument extraction from music , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[76]  Masataka Goto,et al.  RWC Music Database: Popular, Classical and Jazz Music Databases , 2002, ISMIR.