Cantonese AphasiaBank: An annotated database of spoken discourse and co-verbal gestures by healthy and language-impaired native Cantonese speakers

This article reports the construction of a multimodal annotated database of spoken discourse and co-verbal gestures by native healthy speakers of Cantonese and individuals with language impairment: the Cantonese AphasiaBank. This corpus was established as a foundation for aphasiologists and clinicians to use in designing and conducting research investigations into theoretical and clinical issues related to acquired language disorders in Chinese. Details in terms of the purpose, structure, and levels of annotation of the database (containing part-of-speech-annotated orthographic transcripts with Romanization and the corresponding videos) are described. The discussion presents the challenges of building a spoken database of a language that is not linguistically well-researched and that does not have a standardized written form for many of its lexical items, as well as presenting how these issues were addressed. Most importantly, the article highlights the potential of Cantonese AphasiaBank as a powerful research tool for linguists and psycholinguists.

[1]  Klaus Willmes,et al.  Statistical methods for a single-case study approach to aphasia therapy research , 1990 .

[2]  R. Lyle A performance test for assessment of upper limb function in physical rehabilitation treatment and research , 1981, International journal of rehabilitation research. Internationale Zeitschrift fur Rehabilitationsforschung. Revue internationale de recherches de readaptation.

[3]  B. Höhle,et al.  Discourse production in aphasia: a current review of theoretical and methodological challenges , 2016 .

[4]  Chengqing Zong,et al.  CASIA-CASSIL: a Chinese Telephone Conversation Corpus in Real Scenarios with Multi-leveled Annotation , 2010, LREC.

[5]  S. Law,et al.  Effects of context and word class on lexical retrieval in Chinese speakers with anomic aphasia , 2015, Aphasiology.

[6]  R L Hewer,et al.  Aphasia after stroke: natural history and associated deficits. , 1986, Journal of neurology, neurosurgery, and psychiatry.

[7]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[8]  Sam-Po Law,et al.  A Coding System with Independent Annotations of Gesture Forms and Functions During Verbal Communication: Development of a Database of Speech and GEsture (DoSaGE) , 2015, Journal of nonverbal behavior.

[9]  Sergio Carlomagno,et al.  A multi-level approach to the analysis of narrative language in aphasia , 2011 .

[10]  P. Fletcher,et al.  Cantonese pre-school language development: a guide , 2000 .

[11]  Hedda Lausberg,et al.  Methods in Gesture Research: , 2009 .

[12]  Margaret Forbes,et al.  AphasiaBank: Methods for studying discourse , 2011, Aphasiology.

[13]  Brian MacWhinney Morphosyntactic Analysis of the CHILDES and TalkBank Corpora , 2012, LREC.

[14]  J. Packard A Linguistic Investigation of Aphasic Chinese Speech , 1993 .

[15]  S. Law,et al.  Co-verbal gestures among speakers with aphasia: Influence of aphasia severity, linguistic and semantic skills, and hemiplegia on gesture employment in oral discourse. , 2015, Journal of communication disorders.

[16]  B. MacWhinney,et al.  AphasiaBank as BigData , 2016, Seminars in Speech and Language.

[17]  A. Kong Speech-Language Services for Chinese-Speaking People With Aphasia (C-PWA): Considerations for Assessment and Intervention , 2017 .

[18]  Chiu-yu Tseng,et al.  THE DESIGN OF PROSODICALLY ORIENTED MANDARIN SPEECH DATABASE , 1999 .

[19]  Jong S. Kim Stroke in Asia: A Global Disaster , 2014, International journal of stroke : official journal of the International Stroke Society.

[20]  Haipeng Wang,et al.  Analysis of auto-aligned and auto-segmented oral discourse by speakers with aphasia: A preliminary study on the acoustic parameter of duration. , 2013, Procedia, social and behavioral sciences.

[21]  Yasmeen Faroqi-Shah The Rise of Big Data in Neurorehabilitation , 2016, Seminars in Speech and Language.

[22]  S. Law,et al.  A Quantitative Study of Right Dislocation in Cantonese Spoken Discourse , 2017, Language and speech.

[23]  John Lee Toward a Parallel Corpus of Spoken Cantonese and Written Chinese , 2011, IJCNLP.

[24]  Wang Haipeng,et al.  Duration of content and function words in oral discourse by speakers with fluent aphasia: Preliminary data , 2014 .

[25]  L. Milman,et al.  Integrated training for aphasia: an application of part-whole learning to treat lexical retrieval, sentence production, and discourse-level communications in three cases of nonfluent aphasia. , 2014, American journal of speech-language pathology.

[26]  Sam-Po Law,et al.  HKCAC: The Hong Kong Cantonese adult language corpus , 2001 .

[27]  Tan Lee,et al.  Analysis of intonation patterns in Cantonese aphasia speech , 2015, 2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[28]  Huei-ling Lai,et al.  The NCCU Corpus of Spoken Chinese: Mandarin, Hakka, and Southern Min , 2008 .

[29]  Wong Ping-Wai The Specification of POS Tagging of the Hong Kong University Cantonese Corpus , 2006, Int. J. Technol. Hum. Interact..

[30]  Sam-Po Law,et al.  Measuring discourse coherence in anomic aphasia using Rhetorical Structure Theory , 2018, International journal of speech-language pathology.

[31]  Yang Xiao-jun Survey and Prospect of China's Corpus-Based Researches , 2006 .

[32]  Sam-Po Law,et al.  Type and token frequencies of phonological units in Hong Kong Cantonese , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[33]  Sam-Po Law,et al.  An analysis of topics and vocabulary in Chinese oral narratives by normal speakers and speakers with fluent aphasia , 2018, Clinical linguistics & phonetics.

[34]  Linnik Anastasia,et al.  Measuring the coherence of healthy and aphasic discourse production in Chinese using Rhetorical Structure Theory (RST) , 2014 .

[35]  Anthony Pak Hin Kong,et al.  Analysis of Neurogenic Disordered Discourse Production: From Theory to Practice , 2016 .

[36]  E. Yiu,et al.  Linguistic assessment of Chinese-speaking aphasics: Development of a Cantonese aphasia battery , 1992, Journal of Neurolinguistics.

[37]  Herbert H. Clark,et al.  DISCOURSE IN PRODUCTION , 1994 .

[38]  S. Law,et al.  A Comparison of Coverbal Gesture Use in Oral Discourse Among Speakers With Fluent and Nonfluent Aphasia. , 2017, Journal of speech, language, and hearing research : JSLHR.

[39]  Tan Lee,et al.  Spoken language resources for Cantonese speech processing , 2002, Speech Commun..

[40]  W. Chafe The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production , 1980 .

[41]  Michelene Kalinyak-Fliszar,et al.  The case for single-case studies in treatment research—comments on Howard, Best and Nickels “Optimising the design of intervention studies: critiques and ways forward” , 2015, Aphasiology.

[42]  Jianxin Wang,et al.  Recent Progress in Corpus Linguistics in China , 2001 .

[43]  H. Chandler Database , 1985 .

[44]  Hun-tak Thomas Lee,et al.  Cancorp: The Hong Kong Cantonese child language corpus , 1998 .