Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution

Another simple and high-speed F0 extractor with high temporal resolution based on our previous proposal has been developed by adding a higher-order symmetry measure. This extension made the proposed method significantly more robust than the previous one. The proposed method is a detector of the lowest prominent sinusoidal component. It can use several F0 refinement procedures when the signal is the sum of harmonic sinusoidal components. The refinement procedure presented here is based on a stable representation of instantaneous frequency of periodic signals. The whole procedure implemented by Matlab runs faster than realtime on usual PCs for 44,100 Hz sampled sounds. Application of the proposed algorithm revealed that rapid temporal modulations in both F0 trajectory and spectral envelope exist typically in expressive voices such as those those used in lively singing performance.

[1]  Ken-Ichi Sakakibara,et al.  Growl Voice in Ethnic and Pop Styles , 2004 .

[2]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Paul Christopher Bagshaw,et al.  Automatic prosodic analysis for computer aided pronunciation teaching , 1994 .

[4]  M. Unser Sampling-50 years after Shannon , 2000, Proceedings of the IEEE.

[5]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals: Algorithms and Devices , 1983 .

[6]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[7]  Tomoki Toda,et al.  Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds , 2013, INTERSPEECH.

[8]  Hideki Kawahara,et al.  An interference-free representation of instantaneous frequency of periodic signals and its application to F0 extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Nicolas Sturmel,et al.  Glottal closure instant detection using Lines of Maximum Amplitudes (LOMA) of thewavelet transform , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Hideki Kawahara,et al.  Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation , 2012, INTERSPEECH.

[11]  John G Harris,et al.  A sawtooth waveform inspired pitch estimator for speech and music. , 2008, The Journal of the Acoustical Society of America.

[12]  A. Nuttall Some windows with very good sidelobe behavior , 1981 .

[13]  Albert H Nuttall Some Windows with Very Good Sidelobe Behavior; Application to Discrete Hilbert Transform. , 1980 .

[14]  Hideki Kawahara,et al.  Analysis and synthesis of strong vocal expressions: Extension and application of audio texture features to singing voice , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[16]  Shubha Kadambe,et al.  Application of the wavelet transform for pitch detection of speech signals , 1992, IEEE Trans. Inf. Theory.

[17]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[18]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  I. Titze Nonlinear source-filter coupling in phonation: theory. , 2008, The Journal of the Acoustical Society of America.

[20]  Hideki Kawahara,et al.  STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .

[21]  Wolfgang Hess,et al.  Pitch Determination of Speech Signals , 1983 .

[22]  Hideki Kawahara,et al.  Evaluation and optimization of F0-adaptive spectral envelope estimation based on spectral smoothing with peak emphasis , 2010 .

[23]  Satoshi Nakamura,et al.  Robust fundamental frequency estimation using instantaneous frequencies of harmonic components , 2000, INTERSPEECH.

[24]  Petros Maragos,et al.  On amplitude and frequency demodulation using energy operators , 1993, IEEE Trans. Signal Process..

[25]  HIDEKI KAWAHARA,et al.  Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework , 2011 .

[26]  Hideki Kawahara,et al.  Accurate vocal event detection method based on a fixed-point analysis of mapping from time to weighted average group delay , 2000, INTERSPEECH.

[27]  Yonina C. Eldar,et al.  Beyond bandlimited sampling , 2009, IEEE Signal Processing Magazine.

[28]  J. C. Williams,et al.  Noh voice quality , 2009, Logopedics, phoniatrics, vocology.

[29]  Daniel P. W. Ellis,et al.  Noise Robust Pitch Tracking by Subband Autocorrelation Classification , 2012, INTERSPEECH.

[30]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[31]  Hideki Kawahara,et al.  Spectral envelope recovery beyond the nyquist limit for high-quality manipulation of speech sounds , 2008, INTERSPEECH.

[32]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.