Inpainting of Long Audio Segments With Similarity Graphs

We present a novel method for the compensation of long duration data loss in audio signals, in particular music. The concealment of such signal defects is based on a graph that encodes signal structure in terms of time-persistent spectral similarity. A suitable candidate segment for the substitution of the lost content is proposed by an intuitive optimization scheme and smoothly inserted into the gap, i.e., the lost or distorted signal region. Extensive listening tests show that the proposed algorithm provides highly promising results when applied to a variety of real-world music signals.

[1]  Patrick Flandrin,et al.  Improving the readability of time-frequency and time-scale representations by the reassignment method , 1995, IEEE Trans. Signal Process..

[2]  Dennis Gabor,et al.  Theory of communication , 1946 .

[3]  M. Victor Wickerhauser,et al.  Adapted local trigonometric transforms and speech processing , 1993, IEEE Trans. Signal Process..

[4]  Thibaud Necciari,et al.  Audlet Filter Banks: A Versatile Analysis/Synthesis Framework Using Auditory Frequency Scales , 2018 .

[5]  Michael Elad,et al.  Self-content-based audio inpainting , 2015, Signal Process..

[6]  Jonathan Foote,et al.  Visualizing music and audio using self-similarity , 1999, MULTIMEDIA '99.

[7]  Pierre Vandergheynst,et al.  GSPBOX: A toolbox for signal processing on graphs , 2014, ArXiv.

[8]  Heiga Zen,et al.  An HMM-based singing voice synthesis system , 2006, INTERSPEECH.

[9]  Kai Siedenburg,et al.  Audio Inpainting with Social Sparsity , 2013 .

[10]  Nicki Holighaus,et al.  The Large Time-Frequency Analysis Toolbox 2.0 , 2013, CMMR.

[11]  Thibaud Necciari,et al.  A Perceptually Motivated Filter Bank with Perfect Reconstruction for Audio Signal Processing , 2016, ArXiv.

[12]  Nicki Holighaus,et al.  Reassignment and synchrosqueezing for general time-frequency filter banks, subsampling and processing , 2016, Signal Process..

[13]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[14]  Eamonn J. Keogh,et al.  SiMPle: Assessing Music Similarity Using Subsequences Joins , 2016, ISMIR.

[15]  Ta Vinh Thong,et al.  Exemplar-based Assignment of Large Missing Audio Parts using String Matching on Tonal Features , 2011, ISMIR.

[16]  Stephen McAdams,et al.  Music: A science of the mind? , 1987 .

[17]  Roy D. Patterson,et al.  A Dynamic Compressive Gammachirp Auditory Filterbank , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Bryan Pardo,et al.  REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Peter L. Søndergaard,et al.  The Phase Derivative Around Zeros of the Short-Time Fourier Transform , 2011 .

[20]  W. Etter,et al.  Restoration of a discrete-time signal segment by interpolation based on the left-sided and right-sided autoregressive parameters , 1996, IEEE Trans. Signal Process..

[21]  Bryan Pardo,et al.  Leveraging repetition to do audio imputation , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[22]  V. Hardman,et al.  A survey of packet loss recovery techniques for streaming audio , 1998, IEEE Network.

[23]  Michael Elad,et al.  A constrained matching pursuit approach to audio declipping , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Mathieu Lagrange,et al.  Long Interpolation of Audio Signals Using Linear Prediction in Sinusoidal Modeling , 2005 .

[25]  David G. Lowe,et al.  Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Simon J. Godsill,et al.  Digital audio restoration , 1998 .

[27]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[28]  Jeremy Todd,et al.  Parametric Interpolation of Gaps in Audio Signals , 2008 .

[29]  Bryan Pardo,et al.  Online REPET-SIM for real-time speech enhancement , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Paris Smaragdis,et al.  Missing data imputation for spectral audio signals , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[31]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[32]  Peter L. Søndergaard,et al.  The pole behavior of the phase derivative of the short-time Fourier transform , 2011, 1103.0409.

[33]  Jonathan Foote,et al.  Automatic Music Summarization via Similarity Analysis , 2002, ISMIR.

[34]  DeLiang Wang,et al.  Separation of Singing Voice From Music Accompaniment for Monaural Recordings , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Michael Elad,et al.  Audio Inpainting , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[36]  Wai-Choong Wong,et al.  Waveform substitution techniques for recovering missing speech segments in packet voice communications , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Tristan Jehan EVENT-SYNCHRONOUS MUSIC ANALYSIS / SYNTHESIS , 2004 .

[38]  Shingo Uchihashi,et al.  The beat spectrum: a new approach to rhythm analysis , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..