Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech

Copyright: © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). 1 Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China; yingqin@bjtu.edu.cn 2 Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, China; {louislau_1129, siioing, lijingyu0125}@link.cuhk.edu.hk, jerrypeng1937@gmail.com * Correspondence: tanlee@cuhk.edu.hk ‡ These authors contributed equally to this work.

[1]  Alexei Baevski,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[2]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[3]  Björn W. Schuller,et al.  The INTERSPEECH 2010 paralinguistic challenge , 2010, INTERSPEECH.

[4]  Luciana Ferrer,et al.  Alzheimer Disease Recognition Using Speech-Based Embeddings From Pre-Trained Models , 2021, Interspeech.

[5]  Julian Hough,et al.  Alzheimer's Dementia Recognition Using Acoustic, Lexical, Disfluency and Speech Pause Features Robust to Noisy Inputs , 2021, Interspeech.

[6]  Daniele Falavigna,et al.  Phonetic and anthropometric conditioning of MSA-KST cognitive impairment characterization system , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[7]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[8]  Wanxiang Che,et al.  Pre-Training with Whole Word Masking for Chinese BERT , 2019, ArXiv.

[9]  Saturnino Luz,et al.  A Method for Analysis of Patient Speech in Dialogue for Dementia Detection , 2018, ArXiv.

[10]  Dong Wang,et al.  CN-Celeb: A Challenging Chinese Speaker Recognition Dataset , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Jekaterina Novikova,et al.  To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer's Disease Detection , 2020, INTERSPEECH.

[12]  Ronan Collobert,et al.  Unsupervised Cross-lingual Representation Learning for Speech Recognition , 2020, Interspeech.

[13]  Qian Zhang,et al.  Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Francis M. Tyers,et al.  Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.

[15]  Kathleen C. Fraser,et al.  Linguistic Features Identify Alzheimer's Disease in Narrative Speech. , 2015, Journal of Alzheimer's disease : JAD.

[16]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[17]  Veronika Vincze,et al.  Speaking in Alzheimer’s Disease, is That an Early Sign? Importance of Changes in Language Abilities in Alzheimer’s Disease , 2015, Front. Aging Neurosci..

[18]  Fasih Haider,et al.  Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge , 2020, INTERSPEECH.

[19]  Najim Dehak,et al.  Using State of the Art Speaker Recognition and Natural Language Processing Technologies to Detect Alzheimer's Disease and Assess its Severity , 2020, INTERSPEECH.

[20]  Hui Bu,et al.  AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.

[21]  Heidi Christensen,et al.  Detecting Signs of Dementia Using Word Vector Representations , 2018, INTERSPEECH.

[22]  Hao Zheng,et al.  AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[23]  Saturnino Luz,et al.  Detecting cognitive decline using speech only: The ADReSSo Challenge , 2021, medRxiv.

[24]  Thomas F. Quatieri,et al.  Cognitive impairment prediction in the elderly based on vocal biomarkers , 2015, INTERSPEECH.

[25]  Margaret Lech,et al.  Automated Screening for Alzheimer's Dementia Through Spontaneous Speech , 2020, INTERSPEECH.

[26]  Eduardo Coutinho,et al.  The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language , 2016, INTERSPEECH.

[27]  Shoukang Hu,et al.  Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[29]  Joon Son Chung,et al.  VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.

[30]  Man-Wai Mak,et al.  A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  B. MacWhinney The Childes Project: Tools for Analyzing Talk, Volume II: the Database , 2000 .

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  John T. O'Brien,et al.  The midlife cognitive profiles of adults at high risk of late-onset Alzheimer's disease: The PREVENT study , 2017, Alzheimer's & Dementia.

[34]  Kenneth Ward Church,et al.  Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer's Disease , 2020, INTERSPEECH.

[35]  H. Christensen,et al.  Using the Outputs of Different Automatic Speech Recognition Paradigms for Acoustic- and BERT-Based Alzheimer's Dementia Detection Through Spontaneous Speech , 2021, Interspeech.

[36]  Sara Moccia,et al.  Automatic speech analysis to early detect functional cognitive decline in elderly population , 2019, 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[37]  Kris Demuynck,et al.  ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification , 2020, INTERSPEECH.

[38]  Fasih Haider,et al.  An Assessment of Paralinguistic Acoustic Features for Detection of Alzheimer's Dementia in Spontaneous Speech , 2020, IEEE Journal of Selected Topics in Signal Processing.