Towards Automatic Transcription of ILSE ― an Interdisciplinary Longitudinal Study of Adult Development and Aging

The Interdisciplinary Longitudinal Study on Adult Development and Aging (ILSE) was created to facilitate the study of challenges posed by rapidly aging societies in developed countries such as Germany. ILSE contains over 8,000 hours of biographic interviews recorded from more than 1,000 participants over the course of 20 years. Investigations on various aspects of aging, such as cognitive decline, often rely on the analysis of linguistic features which can be derived from spoken content like these interviews. However, transcribing speech is a time and cost consuming manual process and so far only 380 hours of ILSE interviews have been transcribed. Thus, it is the aim of our work to establish technical systems to fully automatically transcribe the ILSE interview data. The joint occurrence of poor recording quality, long audio segments, erroneous transcriptions, varying speaking styles & crosstalk, and emotional & dialectal speech in these interviews presents challenges for automatic speech recognition (ASR). We describe our ongoing work towards the fully automatic transcription of all ILSE interviews and the steps we implemented in preparing the transcriptions to meet the interviews' challenges. Using a recursive long audio alignment procedure 96 hours of the transcribed data have been made accessible for ASR training.

[1]  Bhuvana Ramabhadran,et al.  Automatic recognition of spontaneous speech for access to multilingual oral history archives , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3]  Britta Wendelstein,et al.  Das ILSE-Korpus. Eine korpuslinguistische Perspektive psychologisch-psychiatrischer Forschung am Beispiel der Alzheimer-Demenz , 2011 .

[4]  R. Suzman,et al.  An Overview of the Health and Retirement Study , 1995 .

[5]  Ngoc Thang Vu,et al.  BioKIT - real-time decoder for biosignal processing , 2014, INTERSPEECH.

[6]  Jan Silovský,et al.  Speech-to-text technology to transcribe and disclose 100, 000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive , 2014, INTERSPEECH.

[7]  Roger A. Dixon,et al.  The Victoria Longitudinal Study: From Characterizing Cognitive Aging to Illustrating Changes in Memory Compensation , 2004 .

[8]  J. A. Edwards,et al.  Talking data : transcription and coding in discourse research , 1995 .

[9]  Timothy J. Hazen Automatic alignment and error correction of human generated transcripts for long speech recordings , 2006, INTERSPEECH.

[10]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[11]  K. Scherer,et al.  Acoustic profiles in vocal emotion expression. , 1996, Journal of personality and social psychology.

[12]  G. C. Román,et al.  Vascular dementia , 1993, Neurology.

[13]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[14]  M. Folstein,et al.  Clinical diagnosis of Alzheimer's disease , 1984, Neurology.

[15]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[16]  Peter Schönknecht,et al.  Prevalence and natural course of aging-associated cognitive decline in a population-based sample of young-old subjects. , 2005, The American journal of psychiatry.

[17]  Pedro J. Moreno,et al.  A recursive algorithm for the forced alignment of very long audio segments , 1998, ICSLP.

[18]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[19]  Richard M. Stern,et al.  Robust signal-to-noise ratio estimation based on waveform amplitude distribution analysis , 2008, INTERSPEECH.

[20]  Jörg Peters,et al.  Gesprächsanalytisches Transkriptionssystem 2 (GAT 2) , 2009 .

[21]  R. Lévy,et al.  Aging-Associated Cognitive Decline , 1994, International Psychogeriatrics.

[22]  S. Folstein,et al.  "Mini-mental state". A practical method for grading the cognitive state of patients for the clinician. , 1975, Journal of psychiatric research.

[23]  M. Schmitt,et al.  Persönlichkeit, kognitive Leistungsfähigkeit und Gesundheit in Ost und West: Ergebnisse der Interdisziplinären Längsschnittstudie des Erwachsenenalters (ILSE) , 2000, Zeitschrift für Gerontologie und Geriatrie.

[24]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[25]  J. Morris,et al.  The Consortium to Establish a Registry for Alzheimer's Disease (CERAD). Part I. Clinical and neuropsychological assesment of Alzheimer's disease , 1989, Neurology.

[26]  D. Commenges,et al.  The Paquid epidemiological program on brain ageing. , 1992, Neuroepidemiology.

[27]  Ngoc Thang Vu,et al.  GlobalPhone: A multilingual text & speech database in 20 languages , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  M. Martin,et al.  Design und Methodik der Interdisziplinären Längsschnittstudie des Erwachsenenalters , 2000 .

[29]  M. Schmitt,et al.  Interdisziplinäre Längsschnittstudie des Erwachsenenalters , 2000 .