Using phonetic feature extraction to determine optimal speech regions for maximising the effectiveness of glottal source analysis

Parameterisation of the glottal source has become increasingly useful for speech technology. For many applications it may be desirable to restrict the glottal source feature data to only speech regions where it can be reliably extracted. In this paper we exploit the previously proposed set of binary phonetic feature extractors to help determine optimal regions for glottal source analysis. Besides validation of the phonetic feature extractors, we also quantitatively assess their usefulness for improving voice quality classification and find highly significant reductions in error rates in particular when nasals and fricative regions are excluded. Index Terms: Glottal source, voice source, phonetic features, voice quality

[1]  Nick Campbell,et al.  Automatic Detection of Acoustic Centres of Reliability for Tagging Paralinguistic Information in Expressive Speech , 2002, LREC.

[2]  Bin Yang,et al.  The Relevance of Voice Quality Features in Speaker Independent Emotion Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Automating manual user strategies for precise voice source analysis , 2013, Speech Commun..

[4]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[5]  James W. Minett,et al.  Effects of envelope filter cutoff frequency on the intelligibility of Mandarin noise-vocoded speech in babble noise: implications for cochlear implants , 2013, INTERSPEECH.

[6]  Graeme Hirst Human Language Technology , 2006 .

[7]  John Kane,et al.  Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[9]  Julie Carson-Berndsen,et al.  Articulatory acoustic feature applications in speech synthesis , 2007, INTERSPEECH.

[10]  John Kane,et al.  Evaluation of glottal closure instant detection in a range of voice qualities , 2013, Speech Commun..

[11]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[12]  Paavo Alku,et al.  Comparison of multiple voice source parameters in different phonation types , 2007, INTERSPEECH.

[13]  Axel Röbel,et al.  Mixed source model and its adapted vocal tract filter estimate for voice transformation and synthesis , 2013, Speech Commun..

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  Friedhelm Schwenker,et al.  Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification , 2013, Comput. Speech Lang..

[16]  Junichi Yamagishi,et al.  HMM-based speech synthesiser using the LF-model of the glottal source , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Jacqueline Walker,et al.  A Review of Glottal Waveform Analysis , 2005, WNSP.

[18]  Julie Mauclair,et al.  Automatic Identification of Phonetic Similarity Based on Underspecification , 2009, LTC.

[19]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[20]  Laurent Lamalle,et al.  Brain activations in speech recovery process after intra-oral surgery: an fMRI study , 2013, INTERSPEECH.

[21]  Bin Ma,et al.  Multi-session PLDA scoring of i-vector for partially open-set speaker detection , 2013, INTERSPEECH.

[22]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[23]  T. Hacki Klassifizierung von Glottisdysfunktionen mit Hilfe der Elektroglottographie , 1989 .

[24]  Kishore Prahallad,et al.  The IIIT-H Indic Speech Databases , 2012, INTERSPEECH.

[25]  Colleen Richey,et al.  Effects of vocal effort and speaking style on text-independent speaker verification , 2008, INTERSPEECH.

[26]  Christer Gobl,et al.  Inverse filtering of nasalized vowels using synthesized speech. , 2013, Journal of voice : official journal of the Voice Foundation.

[27]  Ailbhe Ní Chasaide,et al.  Voice parameter dynamics in Portrayed emotions , 2009, MAVEBA.

[28]  João Paulo Papa,et al.  Spoken emotion recognition through optimum-path forest classification using glottal features , 2010, Comput. Speech Lang..

[29]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[30]  C. Gobl Voice source dynamics in connected speech , 1988 .