Detecting early signs of dementia in conversation

Dementia can affect a person's speech, language and conversational interaction capabilities. The early diagnosis of dementia is of great clinical importance. Recent studies using the qualitative methodology of Conversation Analysis (CA) demonstrated that communication problems may be picked up during conversations between patients and neurologists and that this can be used to differentiate between patients with Neuro-degenerative Disorders (ND) and those with non-progressive Functional Memory Disorder (FMD). However, conducting manual CA is expensive and difficult to scale up for routine clinical use. This study introduces an automatic approach for processing such conversations which can help in identifying the early signs of dementia and distinguishing them from the other clinical categories (FMD, Mild Cognitive Impairment (MCI), and Healthy Control (HC)). The dementia detection system starts with a speaker diarisation module to segment an input audio file (determining who talks when). Then the segmented files are passed to an automatic speech recogniser (ASR) to transcribe the utterances of each speaker. Next, the feature extraction unit extracts a number of features (CA-inspired, acoustic, lexical and word vector) from the transcripts and audio files. Finally, a classifier is trained by the features to determine the clinical category of the input conversation. Moreover, we investigate replacing the role of a neurologist in the conversation with an Intelligent Virtual Agent (IVA) (asking similar questions). We show that despite differences between the IVA-led and the neurologist-led conversations, the results achieved by the IVA are as good as those gained by the neurologists. Furthermore, the IVA can be used for administering more standard cognitive tests, like the verbal fluency tests and produce automatic scores, which then can boost the performance of the classifier. The final blind evaluation of the system shows that the classifier can identify early signs of dementia with an acceptable level of accuracy and robustness (considering both sensitivity and specificity).

[1]  Tanja Schultz,et al.  Speech-Based Detection of Alzheimer's Disease in Conversational German , 2016, INTERSPEECH.

[2]  Heidi Christensen,et al.  Simple and robust audio-based detection of biomarkers for Alzheimer's disease , 2016 .

[3]  V. Manera,et al.  Automatic speech analysis for the assessment of patients with predementia and Alzheimer's disease , 2015, Alzheimer's & dementia.

[4]  Steve Renals Recognition and understanding of meetings , 2010, HLT-NAACL 2010.

[5]  Heidi Christensen,et al.  Diagnosing People with Dementia Using Automatic Conversation Analysis , 2016, INTERSPEECH.

[6]  D. Wechsler,et al.  Wechsler Adult Intelligence Scale—Fourth Edition (WAIS-IV) , 2010 .

[7]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  James R. Glass,et al.  The MGB-2 challenge: Arabic multi-dialect broadcast media recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[10]  Alan McCree,et al.  Speaker diarization using deep neural network embeddings , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Klaus A J Riederer 1 LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION , 2000 .

[12]  Jesús B. Alonso,et al.  Feature selection for spontaneous speech analysis to aid in Alzheimer's disease diagnosis: A fractal dimension approach , 2015, Comput. Speech Lang..

[13]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[14]  C. Wells Communication and Cognition in Normal Aging and Dementia , 1990 .

[15]  R. Kenny,et al.  Strengths and Limitations of the MoCA for Assessing Cognitive Functioning , 2016, Journal of geriatric psychiatry and neurology.

[16]  P. Drew,et al.  Conversational assessment in memory clinic encounters: interactional profiling for differentiating dementia from functional memory disorders , 2016, Aging & mental health.

[17]  Li Deng,et al.  Evaluation of the SPLICE algorithm on the Aurora2 database , 2001, INTERSPEECH.

[18]  Gorka Epelde,et al.  Natural Interaction between Avatars and Persons with Alzheimer's Disease , 2008, ICCHP.

[19]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[20]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[21]  Dong Yu,et al.  Improved Bottleneck Features Using Pretrained Deep Neural Networks , 2011, INTERSPEECH.

[22]  Hans Förstl,et al.  Suitability of the 6CIT as a screening test for dementia in primary care patients , 2013, Aging & mental health.

[23]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[25]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[26]  Hanseok Ko,et al.  Deep Neural Network Bottleneck Feature for Acoustic Scene Classification , 2016 .

[27]  Colleen Richey,et al.  Aided diagnosis of dementia type through computer-based analysis of spontaneous speech , 2014, CLPsych@ACL.

[28]  Ponani S. Gopalakrishnan,et al.  Clustering via the Bayesian information criterion with applications in speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[29]  Romola S. Bucks,et al.  Analysis of spontaneous, conversational speech in dementia of Alzheimer type: Evaluation of an objective technique for analysing lexical performance , 2000 .

[30]  I-Fan Chen,et al.  Attribute based lattice rescoring in spontaneous speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  David Wechsler,et al.  Wechsler Memory scale. , 2005 .

[32]  Tran Huy Dat,et al.  A comparative study of multi-channel processing methods for noisy automatic speech recognition in urban environments , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Mike Wald An exploration of the potential of Automatic Speech Recognition to assist and enable receptive communication in higher education , 2006 .

[34]  Wendy J. Holmes,et al.  Speech Synthesis and Recognition , 1988 .

[35]  J. Cummings,et al.  The Montreal Cognitive Assessment, MoCA: A Brief Screening Tool For Mild Cognitive Impairment , 2005, Journal of the American Geriatrics Society.

[36]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[37]  Daniel Garcia-Romero,et al.  Diarization resegmentation in the factor analysis subspace , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  E. Renzi,et al.  Normative Data and Screening Power of a Shortened Version of the Token Test , 1978, Cortex.

[39]  Takeshi Inoue,et al.  Utility and limitations of PHQ-9 in a clinic specializing in psychiatric care , 2012, BMC Psychiatry.

[40]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[41]  Laurent Girin,et al.  Deep neural networks for automatic detection of screams and shouted speech in subway trains , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[43]  Mar Rus-Calafell,et al.  A virtual reality-integrated program for improving social skills in patients with schizophrenia: a pilot study. , 2014, Journal of behavior therapy and experimental psychiatry.

[44]  Satoshi Nakamura,et al.  Detecting Dementia Through Interactive Computer Avatars , 2017, IEEE Journal of Translational Engineering in Health and Medicine.

[45]  Yajie Miao,et al.  EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[46]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[47]  A. Whitworth,et al.  Conversing in dementia: A conversation analytic approach , 1998, Journal of Neurolinguistics.

[48]  George Zavaliagkos,et al.  Utilizing untranscribed training data to improve perfomance , 1998, LREC.

[49]  Danielle Jones A family living with Alzheimer’s disease: The communicative challenges , 2015, Dementia.

[50]  Gerald Friedland,et al.  Overlapped speech detection for improved speaker diarization in multiparty meetings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[53]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[54]  Heidi Christensen,et al.  Detecting and Predicting Alzheimer's Disease Severity in Longitudinal Acoustic Data , 2017, ICBRA.

[55]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[56]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[57]  Vasilisa Verkhodanova,et al.  Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech , 2015, SPECOM.

[58]  Andrew Olney,et al.  Identifying Teacher Questions Using Automatic Speech Recognition in Classrooms , 2016, SIGDIAL Conference.

[59]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[60]  Steve Renals,et al.  SAT-LHUC: Speaker adaptive training for learning hidden unit contributions , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[61]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[62]  Mohammad Hossein Moattar,et al.  A review on speaker diarization systems and approaches , 2012, Speech Commun..

[63]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[64]  Gerald Penn,et al.  Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[65]  H.G. Okuno,et al.  Computational Auditory Scene Analysis and Its Application to Robot Audition: Five Years Experience , 2007, Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure (ICKS'07).

[66]  Sadaoki Furui,et al.  Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance , 2008, Comput. Speech Lang..

[67]  Arti Rawat,et al.  Emotion Recognition through Speech Using Neural Network , 2015 .

[68]  Mark Huckvale,et al.  Avatar therapy for persecutory auditory hallucinations: What is it and how does it work? , 2013, Psychosis.

[69]  Daniel Garcia-Romero,et al.  Speaker diarization with plda i-vector scoring and unsupervised calibration , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[70]  P. Drew,et al.  Towards diagnostic conversational profiles of patients presenting with dementia or functional memory disorders to memory clinics. , 2015, Patient education and counseling.

[71]  Alvin F. Martin,et al.  NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels , 2009, INTERSPEECH.

[72]  Steve Young,et al.  WSJCAM0 corpus and recording description , 1994 .

[73]  Graeme Hirst,et al.  Detecting semantic changes in Alzheimer’s disease with vector space models , 2016, LREC 2016.

[74]  Naoyuki Kanda,et al.  Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks , 2016, INTERSPEECH.

[75]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[76]  Gábor Gosztolya,et al.  Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature Selection , 2016, INTERSPEECH.

[77]  N. Cercone,et al.  Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech , 2005, IEEE International Conference Mechatronics and Automation, 2005.

[78]  A. Kertesz,et al.  A study of language functioning in Alzheimer patients , 1982, Brain and Language.

[79]  A. Ellis,et al.  The age of acquisition of words produced in a semantic fluency task can reliably differentiate normal from pathological age related cognitive decline , 2005, Neuropsychologia.

[80]  Blanka Klimova,et al.  Alzheimer’s disease and language impairments: social intervention and medical treatment , 2015, Clinical interventions in aging.

[81]  Maja Pantic,et al.  Deep complementary bottleneck features for visual speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[82]  S. Crowe,et al.  Is the Boston Naming Test Still Fit For Purpose? , 2014, The Clinical neuropsychologist.

[83]  Lei Chen,et al.  DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring , 2016, INTERSPEECH.

[84]  Niklas Elmqvist,et al.  ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding , 2018, IEEE Transactions on Visualization and Computer Graphics.

[85]  Brian Roark,et al.  Spoken Language Derived Measures for Detecting Mild Cognitive Impairment , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[86]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[87]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[88]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[89]  Masahide Nakamura,et al.  Integrating 3D Facial Model with Person-Centered Care Support System for People with Dementia , 2018, IHSI.

[90]  R. Spitzer,et al.  The PHQ-9: A new depression diagnostic and severity measure , 2002 .

[91]  Marcos Faúndez-Zanuy,et al.  On the Selection of Non-Invasive Methods Based on Speech Analysis Oriented to Automatic Alzheimer Disease Diagnosis , 2013, Sensors.

[92]  Thomas L. Griffiths,et al.  Semi-Supervised Learning with Trees , 2003, NIPS.

[93]  A. Larner,et al.  Use of cognitive screening instruments in primary care: the impact of national dementia directives (NICE/SCIE, National Dementia Strategy). , 2011, Family practice.

[94]  Claus-W. Wallesch,et al.  Spontaneous speech in senile dementia and aphasia: Implications for a neurolinguistic model of language production , 1987, Cognition.

[95]  Adrian Basarab,et al.  On the early diagnosis of Alzheimer's Disease from multimodal signals: A survey , 2016, Artif. Intell. Medicine.

[96]  H. Christensen,et al.  Developing an intelligent virtual agent to stratify people with cognitive complaints: A comparison of human–patient and intelligent virtual agent–patient interaction , 2018, Dementia.

[97]  N. Thomas,et al.  Relating Therapy for distressing auditory hallucinations: A pilot randomized controlled trial , 2017, Schizophrenia Research.

[98]  John R. Hershey,et al.  Minimum word error training of long short-term memory recurrent neural network language models for speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[99]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[100]  Serguei V. S. Pakhomov,et al.  Characterizing cognitive performance in a large longitudinal study of aging with computerized semantic indices of verbal fluency , 2016, Neuropsychologia.

[101]  Jan Niehues,et al.  Combination of NN and CRF models for joint detection of punctuation and disfluencies , 2015, INTERSPEECH.

[102]  Abhishek Verma,et al.  Deep CNN-LSTM with combined kernels from multiple branches for IMDb review sentiment analysis , 2017, 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON).

[103]  Fabio Valente,et al.  DiarTk : An Open Source Toolkit for Research in Multistream Speaker Diarization and its Application to Meetings Recordings , 2012, INTERSPEECH.

[104]  Alvin F. Martin,et al.  The NIST 2010 speaker recognition evaluation , 2010, INTERSPEECH.

[105]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[106]  Richard M. Schwartz,et al.  Analysis of low-resource acoustic model self-training , 2009, INTERSPEECH.

[107]  P. S. Mathuranath,et al.  A brief cognitive test battery to differentiate Alzheimer's disease and frontotemporal dementia , 2000, Neurology.

[108]  Eric Hauser,et al.  Conversation Analysis: Studies from the First Generation , 2006 .

[109]  Mikko Kurimo,et al.  TheanoLM - An Extensible Toolkit for Neural Network Language Modeling , 2016, INTERSPEECH.

[110]  Mark Huckvale,et al.  Avatar therapy: an audio-visual dialogue system for treating auditory hallucinations , 2013, INTERSPEECH.

[111]  Christopher M. Callahan,et al.  Implementing a screening and diagnosis program for dementia in primary care , 2005, Journal of General Internal Medicine.

[112]  Dominique Fohr,et al.  Speaker diarization using normalized cross likelihood ratio , 2007, INTERSPEECH.

[113]  Geoffrey Zweig,et al.  The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[114]  Frank Rudzicz,et al.  Speech Recognition in Alzheimer's Disease and in its Assessment , 2016, INTERSPEECH.

[115]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[116]  Moncef Gabbouj,et al.  Supervised model training for overlapping sound events based on unsupervised source separation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[117]  Douglas A. Reynolds,et al.  The 2018 NIST Speaker Recognition Evaluation , 2019, INTERSPEECH.

[118]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[119]  M. A. Siegler,et al.  Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[120]  Shinichi Sato,et al.  Conversational assessment of cognitive dysfunction among residents living in long-term care facilities , 2017, International Psychogeriatrics.

[121]  S. S. Kumar,et al.  Emotion and Gender Recognition of Speech Signals Using SVM , 2015 .

[122]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[123]  Steve Kelling,et al.  Fusing shallow and deep learning for bioacoustic bird species classification , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[124]  H. Christensen,et al.  Toward the Automation of Diagnostic Conversation Analysis in Patients with Memory Complaints. , 2017, Journal of Alzheimer's disease : JAD.

[125]  H. Vankova Mini Mental State , 2010 .

[126]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[127]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[128]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[129]  Douglas A. Reynolds,et al.  The NIST 2014 Speaker Recognition i-vector Machine Learning Challenge , 2014, Odyssey.

[130]  Hui Jiang,et al.  Discriminative training of HMMs for automatic speech recognition: A survey , 2010, Comput. Speech Lang..

[131]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[132]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[133]  David A. van Leeuwen,et al.  Speaker Diarization Error Analysis Using Oracle Components , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[134]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[135]  Giampiero Salvi,et al.  Sparse Autoencoder Based Semi-Supervised Learning for Phone Classification with Limited Annotations , 2017 .

[136]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[137]  D. Tang‐Wai,et al.  Assessment of Language Function in Dementia , 2022 .

[138]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[139]  Heidi Christensen,et al.  Detecting Signs of Dementia Using Word Vector Representations , 2018, INTERSPEECH.

[140]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 2015 .

[141]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[142]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[143]  Xavier Anguera Miró ROBUST SPEAKER DIARIZATION FOR MEETINGS , 2006 .

[144]  Frank Rudzicz,et al.  Vector-space topic models for detecting Alzheimer’s disease , 2016, ACL.

[145]  Peter Ladefoged,et al.  Phonation types: a cross-linguistic overview , 2001, J. Phonetics.

[146]  E. Kaplan,et al.  The Boston naming test , 2001 .

[147]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[148]  Andrew W. Senior,et al.  Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition , 2014, ArXiv.

[149]  Heidi Christensen,et al.  An Avatar-Based System for Identifying Individuals Likely to Develop Dementia , 2017, INTERSPEECH.

[150]  Yuji Matsumoto,et al.  A Hierarchical Word Sequence Language Model , 2014, PACLIC.

[151]  A. Larner Impact of the National Dementia Strategy in a neurology-led memory clinic: 5-year data. , 2014, Clinical medicine.

[152]  Bhuvana Ramabhadran,et al.  Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.

[153]  J. O'Brien,et al.  Language impairment in dementia: impact on symptoms and care needs in residential homes , 2003, International journal of geriatric psychiatry.

[154]  Matthew Lease,et al.  Recognizing disfluencies in conversational speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[155]  Beatrice Santorini,et al.  The Penn Treebank: An Overview , 2003 .

[156]  Alvin F. Martin,et al.  The 2011 NIST Language Recognition Evaluation , 2010, INTERSPEECH.

[157]  D. Blackburn,et al.  A diagnosis for £55: what is the cost of government initiatives in dementia case finding. , 2014, Age and ageing.

[158]  Heidi Christensen,et al.  Dementia detection using automatic analysis of conversations , 2019, Comput. Speech Lang..

[159]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[160]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[161]  Ali Khodabakhsh,et al.  Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech , 2015, EURASIP J. Audio Speech Music. Process..

[162]  Frank Rudzicz,et al.  Using linguistic features longitudinally to predict clinical scores for Alzheimer’s disease and related dementias , 2015, SLPAT@Interspeech.

[163]  Ngoc Thang Vu,et al.  Rapid Building of an ASR System for Under-Resourced Languages Based on Multilingual Unsupervised Training , 2011, INTERSPEECH.

[164]  D. Wechsler Wechsler Adult Intelligence Scale , 2021, Encyclopedia of Evolutionary Psychological Science.

[165]  A Sutcliffe,et al.  Can you detect early dementia from an email? A proof of principle study of daily computer use to detect cognitive and functional decline , 2018, International journal of geriatric psychiatry.

[166]  George Christodoulides,et al.  Automatic detection and annotation of disfluencies in spoken French corpora , 2015, INTERSPEECH.

[167]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[168]  Brian Roark,et al.  Fully Automated Neuropsychological Assessment for Detecting Mild Cognitive Impairment , 2012, INTERSPEECH.

[169]  H. Hamilton Conversations with an Alzheimer's Patient: An Interactional Sociolinguistic Study , 1994 .

[170]  Tanja Schultz,et al.  Manual and Automatic Transcriptions in Dementia Detection from Speech , 2017, INTERSPEECH.

[171]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[172]  Ray Wilkinson,et al.  Adapting to conversation with semantic dementia: using enactment as a compensatory strategy in everyday social interaction. , 2013, International journal of language & communication disorders.

[173]  Oded Shmueli,et al.  Using Word Embedding to Enable Semantic Queries in Relational Databases , 2017, DEEM@SIGMOD.

[174]  Hervé Bourlard,et al.  Speaker Diarization and Linking of Meeting Data , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[175]  John R Hodges,et al.  The Addenbrooke's Cognitive Examination Revised (ACE‐R): a brief cognitive test battery for dementia screening , 2006, International journal of geriatric psychiatry.

[176]  Fabio Valente,et al.  Speaker diarization of overlapping speech based on silence distribution in meeting recordings , 2012, INTERSPEECH.

[177]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[178]  Gábor Gosztolya,et al.  Automatic detection of mild cognitive impairment from spontaneous speech using ASR , 2015, INTERSPEECH.

[179]  Tatsuya Kawahara,et al.  Semi-Supervised Acoustic Model Training by Discriminative Data Selection From Multiple ASR Systems’ Hypotheses , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[180]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[181]  Tatsuya Kawahara,et al.  Discriminative data selection for lightly supervised training of acoustic model using closed caption texts , 2015, INTERSPEECH.

[182]  Eduardo Gonzalez-Moreira,et al.  Automatic Prosodic Analysis to Identify Mild Dementia , 2015, BioMed research international.

[183]  Francois Bremond,et al.  A Virtual Agent for enhancing performance and engagement of older people with dementia in Serious Games , 2016 .

[184]  Thomas Hain,et al.  DNN-Based Speaker Clustering for Speaker Diarisation , 2016, INTERSPEECH.

[185]  Abeer Alwan,et al.  A noise-robust ASR back-end technique based on weighted viterbi recognition , 2003, INTERSPEECH.

[186]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[187]  Mark J. F. Gales,et al.  The MGB challenge: Evaluating multi-genre broadcast media recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[188]  Alexander H. Waibel,et al.  Optimizing deep bottleneck feature extraction , 2013, The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF).

[189]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[190]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[191]  Tanja Schultz,et al.  Investigating the Effect of Audio Duration on Dementia Detection Using Acoustic Features , 2018, INTERSPEECH.

[192]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[193]  Shinji Watanabe,et al.  Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge , 2018, INTERSPEECH.

[194]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[195]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[196]  Nevin Augustine,et al.  Speech emotion recognition system using both spectral and prosodic features , 2015 .

[197]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[198]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[199]  Hans-Werner Gellersen,et al.  Monitoring dementia with automatic eye movements analysis , 2016 .

[200]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[201]  Simon King,et al.  Where are the challenges in speaker diarization? , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[202]  Yaoru Sun,et al.  Sentiment Analysis of Movie Reviews Based on CNN-BLSTM , 2017, IFIP TC12 ICIS.

[203]  Xiao Li,et al.  Machine Learning Paradigms for Speech Recognition: An Overview , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[204]  Yonghong Yan,et al.  An Exploration of Dropout with LSTMs , 2017, INTERSPEECH.

[205]  Hervé Bourlard,et al.  Overlapping Speech Detection Using Long-Term Conversational Features for Speaker Diarization in Meeting Room Conversations , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[206]  Andreas Stainer-Hochgatterer,et al.  Avatars in Assistive Homes for the Elderly , 2008, USAB.

[207]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[208]  Yiannis Kompatsiaris,et al.  Evaluation of speech-based protocol for detection of early-stage dementia , 2013, INTERSPEECH.

[209]  Tanel Alumäe,et al.  LSTM for punctuation restoration in speech transcripts , 2015, INTERSPEECH.

[210]  Meysam Asgari,et al.  Predicting mild cognitive impairment from spontaneous spoken utterances , 2017, Alzheimer's & dementia.

[211]  Hermann Ney,et al.  Quantile based histogram equalization for noise robust large vocabulary speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[212]  Serguei V. S. Pakhomov,et al.  A computational linguistic measure of clustering behavior on semantic verbal fluency task predicts risk of future dementia in the Nun Study , 2014, Cortex.

[213]  Tiejun Zhao,et al.  Efficient Disfluency Detection with Transition-based Parsing , 2015, ACL.

[214]  Albert A. Rizzo,et al.  Automatic Behavior Analysis During a Clinical Interview with a Virtual Human , 2016, MMVR.

[215]  Mari Ostendorf,et al.  Efficient use of overlap information in speaker diarization , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[216]  Raymond W. M. Ng,et al.  The 2015 sheffield system for longitudinal diarisation of broadcast media , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[217]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[218]  Ludek Müller,et al.  Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement , 2017, INTERSPEECH.

[219]  M. Fernández-Matarrubia,et al.  Behavioural variant frontotemporal dementia: Clinical and therapeutic approaches , 2014 .

[220]  Mark J. F. Gales,et al.  Improved DNN-based segmentation for multi-genre broadcast audio , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[221]  John R. Hodges,et al.  Validation of the Addenbrooke's Cognitive Examination III in Frontotemporal Dementia and Alzheimer's Disease , 2013, Dementia and Geriatric Cognitive Disorders.

[222]  Kathleen C. Fraser,et al.  Linguistic Features Identify Alzheimer's Disease in Narrative Speech. , 2015, Journal of Alzheimer's disease : JAD.

[223]  Dessi Puji Lestari,et al.  Filled Pause Detection in Indonesian Spontaneous Speech , 2015, PACLING.

[224]  Vincent M. Stanford,et al.  The 2021 NIST Speaker Recognition Evaluation , 2022, Odyssey.

[225]  J. Stroop Studies of interference in serial verbal reactions. , 1992 .

[226]  T. Yates What Syndrome Is This? , 1997, The Journal of the Christian Medical Association of India.

[227]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[228]  Robert J. Moore,et al.  Automated Transcription and Conversation Analysis , 2015 .

[229]  Sanjeev Khudanpur,et al.  Deep neural network-based speaker embeddings for end-to-end speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[230]  B. Löwe,et al.  A brief measure for assessing generalized anxiety disorder: the GAD-7. , 2006, Archives of internal medicine.

[231]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[232]  L. Leach,et al.  Diagnostic utility of abbreviated fluency measures in Alzheimer disease and vascular dementia , 2004, Neurology.

[233]  M. Sabbagh,et al.  New Acetylcholinesterase Inhibitors for Alzheimer's Disease , 2011, International journal of Alzheimer's disease.

[234]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[235]  Herbert Gish,et al.  A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[236]  Stephan Vogel,et al.  Speech recognition challenge in the wild: Arabic MGB-3 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[237]  M. Reuber,et al.  Memory difficulties are not always a sign of incipient dementia: a review of the possible causes of loss of memory efficiency. , 2014, British medical bulletin.

[238]  Steven Baker,et al.  Active ageing with avatars: a virtual exercise class for older adults , 2016, OZCHI.

[239]  Matthew Trinkle,et al.  Automatic Detection and Removal of Disfluencies from Spontaneous Speech , 2010 .

[240]  Peter Bell,et al.  A system for automatic broadcast news summarisation, geolocation and translation , 2015, INTERSPEECH.

[241]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[242]  Nicholas C. Firth,et al.  MODELLING EYE-TRACKING DATA TO DISCRIMINATE BETWEEN ALZHEIMER'S PATIENTS AND HEALTHY CONTROLS , 2017, Alzheimer's & Dementia.

[243]  Heidi Christensen,et al.  Computational Cognitive Assessment: Investigating the Use of an Intelligent Virtual Agent for the Detection of Early Signs of Dementia , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[244]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[245]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[246]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[247]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[248]  Gábor Gosztolya,et al.  A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech , 2018, Current Alzheimer research.

[249]  Kai Gao,et al.  Text Understanding with a Hybrid Neural Network Based Learning , 2017, ICPCSEE.

[250]  Carmen García-Mateo,et al.  Depression Detection Using Automatic Transcriptions of De-Identified Speech , 2017, INTERSPEECH.

[251]  J. Sidnell,et al.  The Handbook of Conversation Analysis: Sidnell/The Handbook of Conversation Analysis , 2012 .

[252]  Jon Barker,et al.  The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.