Minimum-latency Time-frequency Analysis Using Asymmetric Window Functions

We study the real-time dynamics retrieval from a time series via the time-frequency (TF) analysis with the minimal latency guarantee. While different from the well-known intrinsic latency definition in the filter design, a rigorous definition of intrinsic latency for different time-frequency representations (TFR) is provided, including the short time Fourier transform (STFT), synchrosqeezing transform (SST) and reassignment method (RM). To achieve the minimal latency, a systematic method is proposed to construct an asymmetric window from a well-designed symmetric one based on the concept of minimum-phase, if the window satisfies some weak conditions. We theoretically show that the TFR determined by SST with the constructed asymmetric window does have a smaller intrinsic latency. Finally, the music onset detection problem is studied to show the strength of the proposed algorithm.

[1]  Mark D. Plumbley,et al.  B-Keeper: a beat-tracker for live performance , 2007, NIME '07.

[2]  Rainer Martin,et al.  A low delay, variable resolution, perfect reconstruction spectral analysis-synthesis system for speech enhancement , 2007, 2007 15th European Signal Processing Conference.

[3]  A. W. M. van den Enden,et al.  Discrete Time Signal Processing , 1989 .

[4]  Douglas D. O'Shaughnessy,et al.  Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique , 2014, Digit. Signal Process..

[5]  Douglas D. O'Shaughnessy,et al.  Robust speech recognition under noisy environments using asymmetric tapers , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[6]  Chin-Teng Lin,et al.  A Real-Time Wireless Brain–Computer Interface System for Drowsiness Detection , 2010, IEEE Transactions on Biomedical Circuits and Systems.

[7]  Hoon Heo,et al.  Note onset detection based on harmonic cepstrum regularity , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[8]  Marc Moonen,et al.  Adaptive Time-Frequency Analysis for Noise Reduction in an Audio Filter Bank With Low Delay , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Florian Krebs,et al.  ONLINE REAL-TIME ONSET DETECTION WITH RECURRENT NEURAL NETWORKS , 2012 .

[10]  Niranjan Damera-Venkata,et al.  Design of optimal minimum-phase digital FIR filters using discrete Hilbert transforms , 2000, IEEE Trans. Signal Process..

[11]  Damjan Vlaj,et al.  ROBUST MFCC FEATURE EXTRACTION ALGORITHM USING EFFICIENT ADDITIVE AND CONVOLUTIONAL NOISE REDUCTION PROCEDURES , 2002 .

[12]  S. Dixon ONSET DETECTION REVISITED , 2006 .

[13]  G. Widmer,et al.  MAXIMUM FILTER VIBRATO SUPPRESSION FOR ONSET DETECTION , 2013 .

[14]  K. Kodera,et al.  Analysis of time-varying signals with small BT values , 1978 .

[15]  Yi-Hsuan Yang,et al.  Multipitch Estimation of Piano Music by Exemplar-Based Sparse Representation , 2012, IEEE Transactions on Multimedia.

[16]  Heinrich W. Lollmann,et al.  Low Delay Filter-Banks for Speech and Audio Processing , 2008 .

[17]  Daniel P. W. Ellis,et al.  Better beat tracking through robust onset aggregation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Christina Gloeckner Foundations Of Time Frequency Analysis , 2016 .

[19]  Florian Krebs,et al.  Evaluating the Online Capabilities of Onset Detection Methods , 2012, ISMIR.

[20]  Rainer Martin,et al.  Improved Reproduction of Stops in Noise Reduction Systems with Adaptive Windows and Nonstationarity Detection , 2009, EURASIP J. Adv. Signal Process..

[21]  Yi Wang,et al.  ConceFT: concentration of frequency and time via a multitapered synchrosqueezed transform , 2015, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Patrick Flandrin,et al.  Improving the readability of time-frequency and time-scale representations by the reassignment method , 1995, IEEE Trans. Signal Process..

[23]  D.A.F. Florencio On the use of asymmetric windows for reducing the time delay in real-time spectral analysis , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Masatoshi Suzuki,et al.  An inverse problem for a class of canonical systems and its applications to self-reciprocal polynomials , 2013, Journal d'Analyse Mathématique.

[25]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[26]  Kyogu Lee,et al.  A pairwise approach to simultaneous onset/offset detection for singing voice using correntropy , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Anoop Gupta,et al.  How low should we go?: understanding the perception of latency while inking , 2014, Graphics Interface.

[28]  Hau-tieng Wu,et al.  Convex Optimization approach to signals with fast varying instantaneous frequency , 2015, 1503.07591.

[29]  Rainer Martin,et al.  Optimization of switchable windows for low-delay spectral analysis-synthesis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Yannis Stylianou,et al.  Three Dimensions of Pitched Instrument Onset Detection , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  J. Tsao,et al.  Time‐varying spectral analysis revealing differential effects of sevoflurane anaesthesia: non‐rhythmic‐to‐rhythmic ratio , 2014, Acta anaesthesiologica Scandinavica.

[32]  I. Daubechies,et al.  Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool , 2011 .

[33]  Hau-tieng Wu,et al.  Nonparametric and adaptive modeling of dynamic seasonality and trend with heteroscedastic and dependent errors , 2012, 1210.4672.

[34]  David Wessel,et al.  Perceptual scheduling in real-time music and audio applications , 2001 .

[35]  Alexander Lerch,et al.  CMMSD: A Data Set for Note-Level Segmentation of Monophonic Music , 2014, Semantic Audio.

[36]  Jun Xiao,et al.  Multitaper Time-Frequency Reassignment for Nonstationary Spectrum Estimation and Chirp Enhancement , 2007, IEEE Transactions on Signal Processing.

[37]  Valentin Emiya Transcription automatique de la musique de piano , 2008 .

[38]  Alan V. Oppenheim,et al.  Discrete-time signal processing (2nd ed.) , 1999 .

[39]  Andrew P. McPherson,et al.  Low-Latency Audio Pitch Tracking: A Multi-Modal Sensor-Assisted Approach , 2014, NIME.

[40]  William A. Sethares,et al.  Classification Based on Speech Rhythm via a Temporal Alignment of Spoken Sentences , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  Björn W. Schuller,et al.  Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.

[42]  Dusan M. Kodek,et al.  Using asymmetric windows in automatic speech recognition , 2007, Speech Commun..

[43]  Hau-Tieng Wu,et al.  Evaluating Physiological Dynamics via Synchrosqueezing: Prediction of Ventilator Weaning , 2013, IEEE Transactions on Biomedical Engineering.

[44]  A. Walden,et al.  Spectral analysis for physical applications : multitaper and conventional univariate techniques , 1996 .

[45]  Yi-Hsuan Yang,et al.  Power-Scaled Spectral Flux and Peak-Valley Group-Delay Methods for Robust Musical Onset Detection , 2014, ICMC.

[46]  Rainer Martin,et al.  Optimal signal reconstruction from a constant-Q spectrum , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[47]  Erik Marchi,et al.  Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Yi-Hsuan Yang,et al.  Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription , 2015, CMMR.

[49]  Yi-Hsuan Yang,et al.  Musical Onset Detection Using Constrained Linear Reconstruction , 2015, IEEE Signal Processing Letters.

[50]  Ángel M. Gómez,et al.  On the Use of Asymmetric Windows for Robust Speech Recognition , 2012, Circuits Syst. Signal Process..

[51]  Joseph Timoney,et al.  Real-time detection of musical onsets with linear prediction and sinusoidal modeling , 2011, EURASIP J. Adv. Signal Process..

[52]  Hau-Tieng Wu,et al.  Non‐parametric and adaptive modelling of dynamic periodicity and trend with heteroscedastic and dependent errors , 2014 .

[53]  Mark B. Sandler,et al.  A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[54]  Roger B. Dannenberg,et al.  Low-latency Music Software Using Off-the-shelf Operating Systems , 1998, ICMC.

[55]  A. Nuttall Some windows with very good sidelobe behavior , 1981 .

[56]  Anssi Klapuri,et al.  Automatic music transcription: challenges and future directions , 2013, Journal of Intelligent Information Systems.