Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings Covering Aging and Cognitive Decline

The Interdisciplinary Longitudinal Study on Adult Development and Aging (ILSE) was initiated with the aim to investigate satisfying and healthy aging. Over 20 years, about 4200 hours of biographic interviews from more than 1,000 participants were recorded. Spoken language is a strong indicator for declining cognitive resources, as it is affected in early stage. Hence, various research topics related to aging like dementia, could be analyzed based on data such as the ILSE interviews. The analysis of language capabilities requires transcribed speech. Since manual transcriptions are time and cost consuming, we aim to automatically transcribing the ILSE data using Automatic Speech Recognition (ASR). The recognition of ILSE interviews is very demanding due to the combination of various challenges: 20 year old analog two-speaker one-channel recordings of low signal quality, emotional and personal interviews between doctor and participant, and repeated recordings of aging, partly fragile individuals. In this study, we describe ongoing work to develop hybrid Hidden Markov Model (HMM)Deep Neural Network (DNN) based ASR system for the ILSE corpus. So far, the best ASR system is obtained by second-pass decoding of a hybrid HMM-DNN model using recurrent neural network based language models with a word error rate of 50.39%.

[1]  Andreas Zenthöfer,et al.  Interdisciplinary longitudinal study on adult development and aging (ILSE) , 2017 .

[2]  Tanja Schultz,et al.  Speech-Based Detection of Alzheimer's Disease in Conversational German , 2016, INTERSPEECH.

[3]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[4]  Romola S. Bucks,et al.  Analysis of spontaneous, conversational speech in dementia of Alzheimer type: Evaluation of an objective technique for analysing lexical performance , 2000 .

[5]  Ali Khodabakhsh,et al.  Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech , 2015, EURASIP J. Audio Speech Music. Process..

[6]  Jochen Weiner,et al.  Perplexity – a new predictor of cognitive changes in spoken language? – results of the Interdisciplinary Longitudinal Study on Adult Development and Aging (ILSE) , 2019, Linguistics Vanguard.

[7]  A. Kertesz,et al.  A study of language functioning in Alzheimer patients , 1982, Brain and Language.

[8]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[9]  Gábor Gosztolya,et al.  Automatic detection of mild cognitive impairment from spontaneous speech using ASR , 2015, INTERSPEECH.

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Swantje Westpfahl,et al.  User, who art thou? User Profiling for Oral Corpus Platforms , 2016, LREC.

[12]  Tanja Schultz,et al.  Towards Automatic Transcription of ILSE ― an Interdisciplinary Longitudinal Study of Adult Development and Aging , 2016, LREC.

[13]  Tanja Schultz,et al.  Selecting Features for Automatic Screening for Dementia Based on Speech , 2018, SPECOM.

[14]  Elmar Nöth,et al.  An Analysis of Perplexity to Reveal the Effects of Alzheimer's Disease on Language , 2016, ITG Symposium on Speech Communication.

[15]  Alexandra König,et al.  Speech-based automatic and robust detection of very early dementia , 2014, INTERSPEECH.

[16]  Alan Pfeffer,et al.  Textkorpora 1: Grunddeutsch. Texte der gesprochenen deutschen Gegenwartssprache. Überregionale Umgangssprache aus der Bundesrepublik Deutschland, der DDR, Österreich und der Schweiz , 1984 .

[17]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[18]  Sanjeev Khudanpur,et al.  A pitch extraction algorithm tuned for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  M. Schmitt,et al.  Persönlichkeit, kognitive Leistungsfähigkeit und Gesundheit in Ost und West: Ergebnisse der Interdisziplinären Längsschnittstudie des Erwachsenenalters (ILSE) , 2000, Zeitschrift für Gerontologie und Geriatrie.

[20]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[21]  Frankie James,et al.  Modified Kneser-Ney Smoothing of n-gram Models , 2000 .

[22]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[23]  Barry M. Leiner,et al.  Activities of Research Institute for Advanced Computer Science , 2001 .

[24]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).