A Spanish multispeaker database of esophageal speech

Abstract A laryngectomee is a person whose larynx has been removed by surgery, usually due to laryngeal cancer. After surgery, most laryngectomees are able to speak again, using techniques that are learned with the help of a speech therapist. This is termed as alaryngeal speech, and esophageal speech (ES) is one of the several alaryngeal speech production modes. A considerable amount of research has been dedicated to the study of alaryngeal speech, with a wide range of aims such as helping speech therapists with evaluation and diagnosis, and improving its quality and intelligibility using digital signal processing techniques. We present to you a database of Spanish ES voices, named AhoSLABI, which is designed to allow the development of new support technologies for this speech impairment. The database primarily consists of recordings of 31 laryngectomees (27 males and 4 females) pronouncing phonetically balanced sentences. Additionally, it includes parallel recordings of the sentences by 9 healthy speakers (6 males and 3 females) to facilitate speech processing tasks that require small parallel corpora, such as voice conversion or synthetic speech adaptation. Apart from the sentences, the database includes sustained vowels and a small set of isolated words, which can be valuable for research on ES analysis, diagnosis and evaluation. The paper describes the main contents of the database, the recording protocols and procedure, as well as the labeling process. The main acoustic characteristics of the voices, such as speaking rate, durations of the recordings, phones and silences, and other such characteristics are compared with those of a reduced set of healthy voices. In addition, we describe an experiment using the database to improve the performance of an ASR system for ES speakers. This new resource will be made available to the scientific community with the hope that it will be used to improve the quality of life of the laryngectomees.

[1]  Pitch perturbation measures of voice production of laryngectomees after the Amatsu tracheoesophageal shunt operation. , 1986, Auris, nasus, larynx.

[2]  Tomoki Toda,et al.  Model training using parallel data with mismatched pause positions in statistical esophageal speech enhancement , 2012, 2012 IEEE 11th International Conference on Signal Processing.

[3]  J. L. Miralles,et al.  Voice intelligibility in patients who have undergone laryngectomies. , 1995, Journal of speech and hearing research.

[4]  Tomoki Toda,et al.  Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models , 2010, IEICE Trans. Inf. Syst..

[5]  Antoni Grzanka,et al.  Vowel Recognition of Patients after Total Laryngectomy using Mel Frequency Cepstral Coefficients and Mouth Contour , 2010 .

[6]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[7]  R H Pindzola,et al.  Acceptability ratings of tracheoesophageal speech , 1988, The Laryngoscope.

[8]  T Nasser,et al.  [Acoustic comparison of esophageal versus tracheoesophageal speech]. , 1999, Revue de laryngologie - otologie - rhinologie.

[9]  Ahmed Hammouch,et al.  A preliminary study on improving the recognition of esophageal speech using a hybrid system based on statistical voice conversion , 2015, SpringerPlus.

[10]  Vijay Parsa,et al.  On the prediction of speech quality ratings of tracheoesophageal speech using an auditory model , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  W. Wszolek,et al.  Acoustic analysis of esophageal speech in patients after total laryngectomy , 2014 .

[12]  Tomoki Toda,et al.  Statistical approach to voice quality control in esophageal speech enhancement , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  H. Lalić,et al.  Presenteeism towards absenteeism: manual work versus sedentary work, private versus governmental--a Croatian review. , 2012, Collegium antropologicum.

[14]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Massimiliana Carello,et al.  A First Comparative Study of Oesophageal and Voice Prosthesis Speech Production , 2009, EURASIP J. Adv. Signal Process..

[16]  Steve Young,et al.  Repairing Tracheoesophageal Speech Duration , 2008 .

[17]  Thomas Drugman,et al.  Tracheoesophageal speech: A dedicated objective acoustic assessment , 2015, Comput. Speech Lang..

[18]  M. Ng,et al.  Speech performance of adult cantonese-speaking laryngectomees using different types of alaryngeal phonation. , 1997, Journal of voice : official journal of the Voice Foundation.

[19]  Juan Andres Morales-Cordovilla,et al.  ASR for electro-laryngeal speech , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[20]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[21]  Tomoki Toda,et al.  Statistical approach to enhancing esophageal speech based on Gaussian mixture models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Elisabet Lundström,et al.  Voice function and quality of life in laryngectomees , 2009 .

[23]  Corina J van As-Brooks,et al.  Acoustic signal typing for evaluation of voice quality in tracheoesophageal speech. , 2006, Journal of voice : official journal of the Voice Foundation.

[24]  Tomoki Toda,et al.  An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Do-Heung Ko,et al.  Cepstral, Spectral and Time-Based Analysis of Voices of Esophageal Speakers , 2015, Folia Phoniatrica et Logopaedica.

[26]  J. Wouters,et al.  Acoustic analysis of tracheo-oesophageal versus oesophageal speech , 1994, The Journal of Laryngology & Otology.

[27]  Irene Jacobi,et al.  Manipulating tracheoesophageal speech , 2010 .

[28]  E. Yiu,et al.  Speech intelligibility, acceptability, and communication-related quality of life in Chinese alaryngeal speakers. , 2009, Archives of otolaryngology--head & neck surgery.

[29]  Vincent Aubanel,et al.  The Sharvard Corpus: A phonemically-balanced Spanish sentence resource for audiology , 2014, International journal of audiology.

[30]  Antoni Grzanka,et al.  Combining acoustic and visual modalities in vowel recognition system for laryngectomees , 2010, 10th Symposium on Neural Network Applications in Electrical Engineering.

[31]  Jean-Marc Vesin,et al.  Prosodic Speech Restoration Device: Glottal Excitation Restoration Using a Multi-resolution Approach , 2010, BIOSTEC.

[32]  M. Singer,et al.  A comparative acoustic study of normal, esophageal, and tracheoesophageal speech production. , 1984, The Journal of speech and hearing disorders.

[33]  Eva Navas,et al.  Parallel vs. Non-Parallel Voice Conversion for Esophageal Speech , 2019, INTERSPEECH.

[34]  Inma Hernáez,et al.  ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish , 2017, EURASIP J. Audio Speech Music. Process..

[35]  Inma Hernáez,et al.  Listening to Laryngectomees: A study of Intelligibility and Self-reported Listening Effort of Spanish Oesophageal Speech , 2018, IberSPEECH.

[36]  Ibon Saratxaga,et al.  LSTM based voice conversion for laryngectomees , 2018, IberSPEECH.

[37]  Tomoki Toda,et al.  Speaking-Aid Systems Based on One-to-Many Eigenvoice Conversion for Total Laryngectomees , 2010 .

[38]  Tiago H. Falk,et al.  Reference-free automatic quality assessment of tracheoesophageal speech , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[39]  T. Most,et al.  Acoustic and perceptual characteristics of esophageal and tracheoesophageal speech production. , 2000, Journal of communication disorders.

[40]  W A Ainsworth,et al.  Perceptual comparison of neoglottal, oesophageal and normal speech. , 1992, Folia phoniatrica.

[41]  Mariano Rosique Arias,et al.  Acoustic Analysis of the Voice in Phonatory Fistuloplasty after Total Laryngectomy , 2000, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[42]  Raghav C Dwivedi,et al.  Acoustic analysis of tracheo-oesophageal voice in male total laryngectomy patients. , 2011, Annals of the Royal College of Surgeons of England.

[43]  Daniel Erro,et al.  INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Ian Vince McLoughlin,et al.  Reconstruction of Normal Sounding Speech for Laryngectomy Patients Through a Modified CELP Codec , 2010, IEEE Transactions on Biomedical Engineering.

[45]  S E Williams,et al.  Speaking proficiency variations according to method of alaryngeal voicing , 1987, The Laryngoscope.

[46]  Morgan Sonderegger,et al.  Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.

[47]  Inma Hernáez,et al.  Personalized synthetic voices for speaking impaired: website and app , 2015, INTERSPEECH.

[48]  Pablo Parente Arias,et al.  Rehabilitación del paciente laringectomizado. Recomendaciones de la Sociedad Española de Otorrinolaringología y Cirugía de Cabeza y Cuello , 2019, Acta Otorrinolaringológica Española.

[49]  F. Hilgers,et al.  A decade of postlaryngectomy vocal rehabilitation in 318 patients: a single Institution's experience with consistent application of provox indwelling voice prostheses. , 2000, Archives of otolaryngology--head & neck surgery.

[50]  H. Gilbert,et al.  An acoustic analysis of excellent female esophageal, tracheoesophageal, and laryngeal speakers. , 2001, Journal of speech, language, and hearing research : JSLHR.

[51]  Arantza del Pozo,et al.  Continuous tracheoesophageal speech repair , 2006, 2006 14th European Signal Processing Conference.

[52]  Begoña García Zapirain,et al.  Esophageal Speech enhancement using modified voicing source , 2013, IEEE International Symposium on Signal Processing and Information Technology.

[53]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[54]  Jack J Jiang,et al.  Acoustic analysis of aperiodic voice: perturbation and nonlinear dynamic properties in esophageal phonation. , 2009, Journal of voice : official journal of the Voice Foundation.

[55]  Jan Cernocký,et al.  Improved feature processing for deep neural networks , 2013, INTERSPEECH.

[56]  J Lindström,et al.  Acoustic and perceptual evaluation of voice and speech quality: a study of patients with laryngeal cancer treated with laryngectomy vs irradiation. , 1999, Archives of otolaryngology--head & neck surgery.