Score-Informed Source Separation for Multichannel Orchestral Recordings

This paper proposes a system for score-informed audio source separation for multichannel orchestral recordings. The orchestral music repertoire relies on the existence of scores. Thus, a reliable separation requires a good alignment of the score with the audio of the performance. To that extent, automatic score alignment methods are reliable when allowing a tolerance window around the actual onset and offset. Moreover, several factors increase the difficulty of our task: a high reverberant image, large ensembles having rich polyphony, and a large variety of instruments recorded within a distant-microphone setup. To solve these problems, we design context-specific methods such as the refinement of score-following output in order to obtain a more precise alignment. Moreover, we extend a close-microphone separation framework to deal with the distant-microphone orchestral recordings. Then, we propose the first open evaluation dataset in this musical context, including annotations of the notes played by multiple instruments from an orchestral ensemble. The evaluation aims at analyzing the interactions of important parts of the separation framework on the quality of separation. Results show that we are able to align the original score with the audio of the performance and separate the sources corresponding to the instrument sections.

[1]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Gaël Richard,et al.  Main instrument separation from stereophonic audio signals using a source/filter model , 2009, 2009 17th European Signal Processing Conference.

[3]  Nicolás Ruiz-Reyes,et al.  Constrained non-negative matrix factorization for score-informed piano music restoration , 2016, Digit. Signal Process..

[4]  Alexey Ozerov,et al.  Notes on Nonnegative Tensor Factorization of the Spectrogram for Audio Source Separation: Statistical Insights and Towards Self-Clustering of the Spatial Cues , 2010, CMMR.

[5]  Markus Schedl,et al.  PHENICX: Performances as Highly Enriched aN d Interactive Concert Experiences , 2013 .

[6]  Nicolás Ruiz-Reyes,et al.  Multiple Instrument Mixtures Source Separation Evaluation Using Instrument-Dependent NMF Models , 2012, LVA/ICA.

[7]  Tapio Lokki,et al.  Anechoic recording system for symphony orchestra , 2008 .

[8]  Simon Dixon,et al.  PYIN: A fundamental frequency estimator using probabilistic threshold distributions , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[10]  Bryan Pardo,et al.  Soundprism: An Online System for Score-Informed Source Separation of Music Audio , 2011, IEEE Journal of Selected Topics in Signal Processing.

[11]  Antoine Liutkus,et al.  Kernel Additive Modeling for interference reduction in multi-channel music recordings , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Mark S. Nixon,et al.  Feature Extraction and Image Processing , 2002 .

[13]  Gerhard Widmer,et al.  Artificial Intelligence in the Concertgebouw , 2015, IJCAI.

[14]  Esteban Maestre,et al.  repoVizz: a framework for remote storage, browsing, annotation, and exchange of multi-modal data , 2013, MM '13.

[15]  John Fitch,et al.  Nature Music and Algorithmic Composition , 1995 .

[16]  Meinard Müller,et al.  Using score-informed constraints for NMF-based source separation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[19]  Mark B. Sandler,et al.  A score-informed shift-invariant extension of complex matrix factorization for improving the separation of overlapped partials in music recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Gary S. Kendall,et al.  A 3-D Sound Primer: Directional Hearing and Stereo Reproduction , 1995 .

[21]  Jordi Janer,et al.  Audio-to-score Alignment at the Note Level for Orchestral Recordings , 2014, ISMIR.

[22]  Paris Smaragdis,et al.  Singing-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks , 2014, ISMIR.

[23]  Simon Dixon,et al.  Compensating for asynchronies between musical voices in score-performance alignment , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Jordi Janer,et al.  Score-informed and timbre independent lead instrument separation in real-world scenarios , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[25]  Mark D. Plumbley,et al.  Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Esteban Maestre,et al.  Measuring ensemble interdependence in a string quartet through analysis of multidimensional performance data , 2014, Front. Psychol..

[27]  Mark D. Plumbley,et al.  Score-Informed Source Separation for Musical Audio Recordings: An overview , 2014, IEEE Signal Processing Magazine.

[28]  Peter Grosche,et al.  High resolution audio synchronization using chroma onset features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Emmanuel Vincent,et al.  Subjective and Objective Quality Assessment of Audio Source Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Jordi Janer,et al.  Improving Score-Informed Source Separation for Classical Music through Note Refinement , 2015, ISMIR.

[31]  Amparo Martí Guerola,et al.  Multichannel audio processing for speaker localization, separation and enhancement , 2013 .

[32]  Bryan Pardo,et al.  Online Score-Informed Source Separation with Adaptive Instrument Models , 2015 .

[33]  Christopher Raphael,et al.  Evaluation of Real-Time Audio-to-Score Alignment , 2007, ISMIR.

[34]  Irfan A. Essa,et al.  Estimating the Spatial Position of Spectral Components in Audio , 2006, ICA.

[35]  Nicolás Ruiz-Reyes,et al.  An Audio to Score Alignment Framework Using Spectral Factorization and Dynamic Time Warping , 2015, ISMIR.

[36]  Meinard Müller,et al.  Score-Informed Voice Separation For Piano Recordings , 2011, ISMIR.

[37]  Anssi Klapuri,et al.  Sound source separation in monaural music signals using excitation-filter model and em algorithm , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  D R Begault,et al.  Headphone Localization of Speech , 1993, Human factors.

[39]  Juan José Burred,et al.  From sparse models to timbre learning: new methods for musical source separation , 2009 .

[40]  Maximo Cobos,et al.  Nonnegative signal factorization with learnt instrument models for sound source separation in close-microphone recordings , 2013, EURASIP J. Adv. Signal Process..

[41]  Derry Fitzgerald,et al.  Sound Source Separation Using Shifted Non-Negative Tensor Factorisation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[42]  John Mourjopoulos,et al.  Unmixing Acoustic Sources in Real Reverberant Environments for Close-Microphone Applications* , 2010 .

[43]  Masataka Goto,et al.  Development of the RWC Music Database , 2004 .

[44]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[45]  D. Fitzgerald,et al.  Non-negative Tensor Factorisation for Sound Source Separation , 2005 .

[46]  Roger B. Dannenberg,et al.  Understanding Features and Distance Functions for Music Sequence Alignment , 2010, ISMIR.