Score-Informed Voice Separation For Piano Recordings

The decomposition of a monaural audio recording into musically meaningful sound sources or voices constitutes a fundamental problem in music information retrieval. In this paper, we consider the task of separating a monaural piano recording into two sound sources (or voices) that correspond to the left hand and the right hand. Since in this scenario the two sources share many physical properties, sound separation approaches identifying sources based on their spectral envelope are hardly applicable. Instead, we propose a score-informed approach, where explicit note events specified by the score are used to parameterize the spectrogram of a given piano recording. This parameterization then allows for constructing two spectrograms considering only the notes of the left hand and the right hand, respectively. Finally, inversion of the two spectrograms yields the separation result. First experiments show that our approach, which involves high-resolution music synchronization and parametric modeling techniques, yields good results for realworld non-synthetic piano recordings.

[1]  Peter Grosche,et al.  High resolution audio synchronization using chroma onset features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[3]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Gautham J. Mysore,et al.  Source Separation By Score Synthesis , 2010, ICMC.

[5]  Christopher Raphael,et al.  Desoloing Monaural Audio Using Mixture Models , 2007, ISMIR.

[6]  Meinard Müller,et al.  Estimating note intensities in music recordings , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Gaël Richard,et al.  Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Masataka Goto,et al.  Instrument Equalizer for Query-by-Example Retrieval: Improving Sound Source Separation Based on Integrated Harmonic and Inharmonic Models , 2008, ISMIR.

[9]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[10]  Shigeki Sagayama,et al.  HMM-based approach for automatic chord detection using refined acoustic features , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Anssi Klapuri,et al.  Musical Instrument Recognition in Polyphonic Audio Using Source-Filter Model for Sound Separation , 2009, ISMIR.

[12]  Roger B. Dannenberg,et al.  Remixing Stereo Music with Score-Informed Source Separation , 2006, ISMIR.