Constrained non-negative matrix factorization for score-informed piano music restoration

In this work, we propose a constrained non-negative matrix factorization method for the audio restoration of piano music using information from the score. In the first stage (instrument training), spectral patterns for the target source (piano) are learned from a dataset of isolated piano notes. The model for the piano is constrained to be harmonic because, in this way, each pattern can define a single pitch. In the second stage (noise training), spectral patterns for the undesired source (noise) are learned from the most common types of vinyl noises. To obtain a representative model for the vinyl noise, a cross-correlation-based constraint that minimizes the cross-talk between different noise components is used. In the final stage (separation), we use the trained instrument and noise models in an NMF framework to extract the clean audio signal from undesired non-stationary noise. To improve the separation results, we propose a novel score-based constraint to avoid activations of notes or combinations that are not present in the original score. The proposed approach has been evaluated and compared with commercial audio restoration softwares, obtaining competitive results.

[1]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Sergio Canazza,et al.  Restoration of Audio Documents by Means of Extended Kalman Filter , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  John Mourjopoulos,et al.  Speech enhancement based on audible noise suppression , 1997, IEEE Trans. Speech Audio Process..

[4]  渡辺馨 Objective measurement method of audio quality in accordance with ITU-R Recommendation BS. 1387 , 2001 .

[5]  Roland Badeau,et al.  Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Jae-Hoon Jeong,et al.  Semi-blind disjoint non-negative matrix factorization for extracting target source from single channel noisy mixture , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[8]  Matti Karjalainen,et al.  Restoration and Enhancement of Solo Guitar Recordings Based on Sound Source Modeling , 2002 .

[9]  George Carayannis,et al.  Speech enhancement from noise: A regenerative approach , 1991, Speech Commun..

[10]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[11]  Bryan Pardo,et al.  Online Score-Informed Source Separation with Adaptive Instrument Models , 2015 .

[12]  Simon J. Godsill,et al.  A Bayesian approach to the restoration of degraded audio signals , 1995, IEEE Trans. Speech Audio Process..

[13]  Tuomas Virtanen,et al.  Musical Instrument Sound Multi-Excitation Model for Non-Negative Spectrogram Factorization , 2011, IEEE Journal of Selected Topics in Signal Processing.

[14]  S. V. Vaseghi,et al.  Restoration of Old Gramophone Recordings , 1992 .

[15]  Masataka Goto,et al.  RWC Music Database: Music genre database and musical instrument sound database , 2003, ISMIR.

[16]  Bhiksha Raj,et al.  Speech denoising using nonnegative matrix factorization with priors , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Simon J. Godsill,et al.  Bayesian Enhancement of Speech and Audio Signals which can be Modelled as ARMA Processes , 1997 .

[18]  Simon J. Godsill,et al.  Statistical Model-Based Approaches to Audio Restoration and Analysis , 2001 .

[19]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[20]  Rémi Gribonval,et al.  BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .

[21]  BertinNancy,et al.  Nonnegative matrix factorization with the itakura-saito divergence , 2009 .

[22]  J. Larsen,et al.  Wind Noise Reduction using Non-Negative Sparse Coding , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[23]  Emad M. Grais,et al.  Single channel speech music separation using nonnegative matrix factorization and spectral masks , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[24]  P. Smaragdis,et al.  Shift-Invariant Probabilistic Latent Component Analysis , 2007 .

[25]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  Peter J. W. Rayner,et al.  Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[27]  Emmanuel Vincent,et al.  Adaptive Harmonic Spectral Decomposition for Multiple Pitch Estimation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Ning Ma,et al.  Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Shigeki Sagayama,et al.  Multipitch Analysis with Harmonic Nonnegative Matrix Approximation , 2007, ISMIR.

[30]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[32]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[33]  Sebastian Ewert,et al.  The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[34]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Mikkel N. Schmidt Single-Channel Speech Separation usin , 2006 .

[36]  Simon J. Godsill,et al.  Robust noise modelling with application to audio restoration , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[37]  Nicolás Ruiz-Reyes,et al.  Percussive/harmonic sound separation by non-negative matrix factorization with smoothness/sparseness constraints , 2014, EURASIP J. Audio Speech Music. Process..

[38]  Emmanuel Vincent,et al.  Enforcing Harmonicity and Smoothness in Bayesian Non-Negative Matrix Factorization Applied to Polyphonic Music Transcription , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Olivier Cappe ENHANCEMENT OF MUSICAL SIGNALS OF THE SHORT-TERM SPECTRAL COMPONENTS , 1993 .

[40]  Nicolás Ruiz-Reyes,et al.  Constrained non-negative sparse coding using learnt instrument templates for realtime music transcription , 2013, Eng. Appl. Artif. Intell..

[41]  Roberto Rinaldo,et al.  THE RESTORATION OF SINGLE CHANNEL AUDIO RECORDINGS BASED ON NON-NEGATIVE MATRIX FACTORIZATION AND PERCEPTUAL SUPPRESSION RULE , 2010 .