Significance of aperiodicity in the pitch perception of expressive voices

In this paper, we study the significance of aperiodicity in the pitch-perception of expressive voices such as Noh voice and laughter signals. The excitation source characteristics in the production of these signals is represented in terms of a sequence of impulses. The impulse sequence is derived from the acoustic signal using a modified zero-frequency filtering method. The time intervals between successive impulses and relative amplitudes of impulses are related to the presence of subharmonics and pitch-perception in expressive voices. The role of aperiodicity and subharmonics in the perception of distinct voice quality of expressive voices is examined. The significance of aperiodicity is also analysed by synthesis, using two synthetic AM/FM sequences for excitation. Saliency is used as a measure of pitch perception. The F0 extraction using this pitch perception information for expressive voices is also demonstrated.

[1]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[2]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Hideki Kawahara,et al.  A unified approach for F 0 extraction and aperiodicity estimation based on a temporally stable power spectral representation , 2008 .

[4]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Roy D. Patterson,et al.  Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[6]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[7]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[8]  Bayya Yegnanarayana,et al.  Decomposition of speech signals for analysis of aperiodic components of excitation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  Paavo Alku,et al.  Amplitude domain quotient for characterization of the glottal volume velocity waveform estimated by inverse filtering , 1996, Speech Commun..

[11]  HEMA A MURTHY,et al.  Group delay functions and its applications in speech technology , 2011 .

[12]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Mike Brookes,et al.  The DYPSA algorithm for estimation of glottal closure instants in voiced speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Takao Kobayashi,et al.  Fundamental frequency estimation based on instantaneous frequency amplitude spectrum , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Bayya Yegnanarayana,et al.  Analysis of production characteristics of laughter , 2015, Comput. Speech Lang..

[17]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. II. A/lgorithms and applications , 1992, Proc. IEEE.

[18]  J. C. Williams,et al.  Noh voice quality , 2009, Logopedics, phoniatrics, vocology.

[19]  Boualem Boashash,et al.  Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals , 1992, Proc. IEEE.

[20]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[21]  HIDEKI KAWAHARA,et al.  Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework , 2011 .

[22]  T. Abe,et al.  The IF Spectrogram : A New Spectral Representation , 1997 .

[23]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.