Analysis of aperiodicity in artistic Noh singing voice using an impulse sequence representation of excitation source.

Aperiodicity in the voice source is caused by changes in the vocal fold vibrations, other than the normal quasi-periodicity and the turbulence at the glottis. The aperiodicity appears to be one of the main properties that is responsible for conveying the emotion in artistic voices. In this paper, the feasibility of representing the excitation source characteristics in artistic (Noh) singing voice by an impulse-like sequence in the time domain is examined. The impulses at the glottal closure instants contribute to the major excitation of the vocal tract system. The sequence of such impulses produces harmonics of the fundamental frequency in the spectrum. The amplitude variation or amplitude modulation (AM) of these impulses in the sequence contributes to the aperiodicity in the excitation, and can result in appearance of subharmonics in the spectrum. The variation in the impulse intervals or frequency modulation (FM) can also contribute to the aperiodicity in the excitation. The aperiodic component of the excitation in the Noh voice is examined in the impulse-like sequence derived from the signal using the single frequency filtering analysis. The effects of aperiodicity are explained for synthetic AM and FM sequences of impulses using spectrograms and saliency plots.

[1]  Thierry Dutoit,et al.  A comparative study of glottal source estimation techniques , 2019, Comput. Speech Lang..

[2]  Hideki Kawahara,et al.  A unified approach for F 0 extraction and aperiodicity estimation based on a temporally stable power spectral representation , 2008 .

[3]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Abeer Alwan,et al.  From high-speed imaging to perception: In search of a perceptually relevant voice source model , 2011 .

[5]  J. C. Williams,et al.  Noh voice quality , 2009, Logopedics, phoniatrics, vocology.

[6]  Bayya Yegnanarayana,et al.  Single Frequency Filtering Approach for Discriminating Speech and Nonspeech , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[7]  Thomas F. Quatieri,et al.  Evaluation of speech inverse filtering techniques using a physiologically based synthesizer , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Bayya Yegnanarayana,et al.  Significance of aperiodicity in the pitch perception of expressive voices , 2014, INTERSPEECH.

[9]  I. Titze,et al.  Nonlinear source-filter coupling in phonation: vocal exercises. , 2008, The Journal of the Acoustical Society of America.

[10]  R Veldhuis,et al.  A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation. , 1998, The Journal of the Acoustical Society of America.

[11]  I. Titze Nonlinear source-filter coupling in phonation: theory. , 2008, The Journal of the Acoustical Society of America.

[12]  Vinay Kumar Mittal,et al.  Study of characteristics of aperiodicity in Noh voices. , 2015, The Journal of the Acoustical Society of America.

[13]  Bayya Yegnanarayana,et al.  Decomposition of speech signals for analysis of aperiodic components of excitation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Thierry Dutoit,et al.  Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation , 2011, Speech Commun..

[16]  Bayya Yegnanarayana,et al.  Analysis of laugh signals for detecting in continuous speech , 2009, INTERSPEECH.

[17]  Bayya Yegnanarayana,et al.  Analysis of singing voice for epoch extraction using Zero Frequency Filtering method , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  P. Alku,et al.  Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering. , 2009, The Journal of the Acoustical Society of America.

[19]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[20]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[21]  Perceptual importance of the voice source spectrum from H2 to 2 kHz , 2011 .

[22]  Bayya Yegnanarayana,et al.  Epoch extraction from emotional speech using single frequency filtering approach , 2017, Speech Commun..

[23]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Jacqueline Walker,et al.  A Review of Glottal Waveform Analysis , 2005, WNSP.

[26]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Bayya Yegnanarayana,et al.  Significance of phase in single frequency filtering outputs of speech signals , 2018, Speech Commun..

[28]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.