CantoMap: a Hong Kong Cantonese MapTask Corpus

This work reports on the construction of a corpus of connected spoken Hong Kong Cantonese. The corpus aims at providing an additional resource for the study of modern (Hong Kong) Cantonese and also involves several controlled elicitation tasks which will serve different projects related to the phonology and semantics of Cantonese. The word-segmented corpus offers recordings, phonemic transcription, and Chinese characters transcription. The corpus contains a total of 768 minutes of recordings and transcripts of forty speakers. All the audio material has been aligned at utterance level with the transcriptions, using the ELAN transcription and annotation tool. The controlled elicitation task was based on the design of HCRC MapTask corpus (Anderson et al., 1991), in which participants had to communicate using solely verbal means as eye contact was restricted. In this paper, we outline the design of the maps and their landmarks and the basic segmentation principles of the data and various transcription conventions we adopted. We also compare the contents of Cantomap to those of comparable Cantonese corpora.

[1]  Virginia Yip,et al.  Cantonese: A Comprehensive Grammar , 1994 .

[2]  Kk Luke Sentence particles in Cantonese , 2000 .

[3]  John Lee,et al.  Quantitative Comparative Syntax on the Cantonese-Mandarin Parallel Dependency Treebank , 2017, DepLing.

[4]  Yurie Hara,et al.  Particles of (Un)expectedness: Cantonese Wo and Lo , 2015, JSAI-isAI Workshops.

[5]  Lym Wong,et al.  The Hong Kong Cantonese Corpus: design and uses , 2015 .

[6]  Laurence R. Horn Information Structure and the Landscape of (Non-)at-issue Meaning , 2016 .

[7]  William Torres Cacoullos Rena Labov,et al.  Language Variation and Change , 1989 .

[8]  付伶俐 打磨Using Language,倡导新理念 , 2014 .

[9]  K. Luke Utterance Particles in Cantonese Conversation , 1990 .

[10]  P. Fletcher,et al.  Cantonese pre-school language development: a guide , 2000 .

[11]  Chu-Ren Huang,et al.  Mandarin Chinese Words and Parts of Speech: A Corpus-based Study , 2017 .

[12]  Sam-Po Law,et al.  HKCAC: The Hong Kong Cantonese adult language corpus , 2001 .

[13]  J. Gibbons,et al.  Code Mixing and Code Choice: A Hong Kong Case Study , 1987 .

[14]  P. Mok,et al.  Perception of the merging tones in Hong Kong Cantonese : preliminary data on monosyllables , 2010 .

[15]  David C. S. Li Cantonese‐English code‐switching research in Hong Kong: a Y2K review , 2000 .

[16]  C. Lau,et al.  Assessing the accuracy of production of Cantonese lexical tones: a comparison between perceptual judgement and an instrumental measure , 2002 .