Phase recovery by unwrapping: applications to music signal processing

This paper introduces a novel technique for reconstructing the phase of modified spectrograms of audio signals. From the analysis of mixtures of sinusoids we obtain relationships between phases of successive time frames in the Time-Frequency (TF) domain. Instantaneous frequencies are estimated locally to encompass the class of non-stationary signals such as vibratos. This technique ensures the horizontal coherence (over time) of the partials. The method is tested on a variety of data and demonstrates better performance than traditional consistency-based approaches. We also introduce an audio restoration framework and obtain results that compete with other state-of-the-art methods. Finally, we apply this phase recovery method to an audio source separation task where the spectrograms of the isolated components are known. We propose to use the phase unwrapping estimate to initialize a source separation iterative procedure. Experiments conducted on realistic music pieces demonstrate the effectiveness of such a method for various music signal processing tasks.

[1]  Antoine Liutkus,et al.  The 2018 Signal Separation Evaluation Campaign , 2018, LVA/ICA.

[2]  R. Badeau,et al.  An iterative algorithm for recovering the phase of complex components from their mixture , 2016 .

[3]  Roland Badeau,et al.  Complex NMF under phase constraints based on signal modeling: Application to audio source separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Roland Badeau,et al.  Phase reconstruction of spectrograms with linear unwrapping: Application to audio signal restoration , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[5]  Roland Badeau,et al.  Phase reconstruction of spectrograms based on a model of repeated audio events , 2015, 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[6]  Lonce L. Wyse,et al.  Single Pass Spectrogram Inversion , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).

[7]  Jonathan Le Roux,et al.  Phase Processing for Single-Channel Speech Enhancement: History and recent advances , 2015, IEEE Signal Processing Magazine.

[8]  Timo Gerkmann,et al.  STFT Phase Reconstruction in Voiced Speech for an Improved Single-Channel Speech Enhancement , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Mark D. Plumbley,et al.  Multichannel High-Resolution NMF for Modeling Convolutive Mixtures of Non-Stationary Signals in the Time-Frequency Domain , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10]  Roland Badeau,et al.  Towards complex matrix decomposition of spectrograms based on the relative phase offsets of harmonic sounds , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Philippe Depalle,et al.  Phase constrained complex NMF: Separating overlapping partials in mixtures of harmonic musical sources , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Jakob Abeßer,et al.  Automatic Tablature Transcription of Electric Guitar Recordings by Estimation of Score- and Instrument-Related Parameters , 2014, DAFx.

[13]  Akihiko Sugiyama,et al.  Tapping-noise suppression with magnitude-weighted phase-based detection , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[14]  Peter L. Søndergaard,et al.  A fast Griffin-Lim algorithm , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15]  Jonathan Le Roux,et al.  Consistent Wiener Filtering for Audio Source Separation , 2013, IEEE Signal Processing Letters.

[16]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[17]  T. Gerkmann,et al.  Phase estimation in speech enhancement — Unimportant, important, or impossible? , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[18]  Søren Holdt Jensen,et al.  New Results on Single-Channel Speech Separation Using Sinusoidal Modeling , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Roland Badeau,et al.  Score informed audio source separation using a parametric model of non-negative spectrogram , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Kuldip K. Paliwal,et al.  The importance of phase in speech enhancement , 2011, Speech Commun..

[21]  Bertrand David,et al.  MAPS - A piano database for multipitch estimation and automatic transcription of music , 2010 .

[22]  Deep Sen,et al.  Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures , 2010, IEEE Signal Processing Letters.

[23]  Mike E. Davies,et al.  Latent Variable Analysis and Signal Separation , 2010 .

[24]  Volker Gnann,et al.  IMPROVING RTISI PHASE ESTIMATION WITH ENERGY ORDER AND PHASE UNWRAPPING , 2010 .

[25]  Hirokazu Kameoka,et al.  Complex NMF: A new sparse representation for acoustic signals , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[27]  Gaël Richard,et al.  Estimation of Frequency for AM/FM Models Using the Phase Vocoder Framework , 2008, IEEE Transactions on Signal Processing.

[28]  Jonathan Le Roux,et al.  Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction , 2008, SAPA@INTERSPEECH.

[29]  Hirokazu Kameoka,et al.  Computational auditory induction by missing-data non-negative matrix factorization , 2008, SAPA@INTERSPEECH.

[30]  Lonce L. Wyse,et al.  Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Rémi Gribonval,et al.  Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Queen Mary MUSICAL AUDIO STREAM SEPARATION BY NON-NEGATIVE MATRIX FACTORIZATION , 2005 .

[33]  Mototsugu Abe,et al.  Design Criteria for Simple Sinusoidal Parameter Estimation Based on Quadratic Interpolation of FFT Magnitude Peaks , 2004 .

[34]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[35]  Axel R¨obel A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER , 2003 .

[36]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[37]  Jean Laroche,et al.  Improved phase vocoder time-scale modification of audio , 1999, IEEE Trans. Speech Audio Process..

[38]  M. Charbit Du traitement du signal aux applications grand public , 1997 .

[39]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[40]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[41]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[42]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[43]  Mototsugu Abe,et al.  Design Criteria for the Quadratically Interpolated FFT Method ( I ) : Bias due to Interpolation October 13 , 2004 , .