Majorisation-Minimisation Based Optimisation of the Composite Autoregressive System with Application to Glottal Inverse Filtering

The composite autoregressive system can be used to estimate a speech source-filter decomposition in a rigorous manner, thus having potential use in glottal inverse filtering. By introducing a suitable prior, spectral tilt can be introduced into the source component estimation to better correspond to human voice production. However, the current expectationmaximisation based composite autoregressive model optimisation leaves room for improvement in terms of speed. Inspired by majorisation-minimisation techniques used for nonnegative matrix factorisation, this work derives new update rules for the model, resulting in faster convergence compared to the original approach. Additionally, we present a new glottal inverse filtering method based on the composite autoregressive system and compare it with inverse filtering methods currently used in glottal excitation modelling for parametric speech synthesis. These initial results show that the proposed method performs comparatively well, sometimes outperforming the reference methods.

[1]  Hirokazu Kameoka Non-negative Matrix Factorization and Its Variants for Audio Signal Processing , 2016 .

[2]  P. Alku,et al.  Closed phase covariance analysis based on constrained linear prediction for glottal inverse filtering. , 2009, The Journal of the Acoustical Society of America.

[3]  Boris Doval,et al.  Spectral correlates of glottal waveform models: an analytic study , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  T. Hacki Klassifizierung von Glottisdysfunktionen mit Hilfe der Elektroglottographie , 1989 .

[5]  Paavo Alku,et al.  Voice source modelling using deep neural networks for statistical parametric speech synthesis , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[6]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[7]  Nancy Bertin,et al.  Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis , 2009, Neural Computation.

[8]  Lauri Juvela,et al.  Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort , 2014, INTERSPEECH.

[9]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[10]  E. Stacy A Generalization of the Gamma Distribution , 1962 .

[11]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[12]  Hirokazu Kameoka,et al.  Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation , 2010, LVA/ICA.

[13]  Paavo Alku,et al.  Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[15]  Tomoki Toda,et al.  Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[16]  Hirokazu Kameoka,et al.  Selective Amplifier of Periodic and Non-periodic Components in Concurrent Audio Signals with Spectral Control Envelopes , 2006 .

[17]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[18]  Hirokazu Kameoka,et al.  Composite autoregressive system for sparse source-filter representation of speech , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[19]  Paavo Alku,et al.  HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[21]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.