The Indigenous Languages Technology project at NRC Canada: An empowerment-oriented approach to developing language software

This paper surveys the first, three-year phase of a project at the National Research Council of Canada that is developing software to assist Indigenous communities in Canada in preserving their languages and extending their use. The project aimed to work within the empowerment paradigm, where collaboration with communities and fulfillment of their goals is central. Since many of the technologies we developed were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (such as Kanyen’kéha, in the Iroquoian language family), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for recordings of speech in Indigenous languages (and other languages), software for implementing text prediction and read-along audiobooks for Indigenous languages, and several other subprojects.

[1]  Vishwa Gupta,et al.  Speech Transcription Challenges for Resource Constrained Indigenous Language Cree , 2020, SLTU.

[2]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[3]  Douglas H. Whalen,et al.  Healing through language: Positive physical health effects of indigenous language use , 2016 .

[4]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[5]  Jon Reyhner,et al.  Indigenous Language Immersion Schools for Strong Indigenous Identities. , 2010 .

[6]  Aliana Parker,et al.  Indigenous languages in Canada , 2017 .

[7]  Ian Pool,et al.  Colonialism’s and postcolonialism’s fellow traveller: the collection, use and misuse of data on indigenous people , 2016 .

[8]  Ewa Czaykowska-Higgins Research Models, Community Engagement, and Linguistic Fieldwork: Reflections on Working within Canadian Indigenous Communities , 2009 .

[9]  Darcy Hallett,et al.  Aboriginal language knowledge and youth suicide , 2007 .

[10]  M. Chandler,et al.  Cultural Continuity as a Hedge against Suicide in Canada's First Nations , 1998 .

[11]  A. Kazantseva,et al.  Kawennón:nis: the Wordmaker for Kanyen’kéha , 2018 .

[12]  Jeffrey Micher,et al.  The Nunavut Hansard Inuktitut–English Parallel Corpus 3.0 with Preliminary Machine Translation Results , 2020, LREC.

[13]  Anna Kazantseva,et al.  Indigenous language technologies in Canada: Assessment, challenges, and successes , 2018, COLING.

[14]  Daan van Esch,et al.  Writing Across the World's Languages: Deep Internationalization for Gboard, the Google Keyboard , 2019, ArXiv.

[15]  E. Toth,et al.  Cultural continuity, traditional Indigenous language, and diabetes in Alberta First Nations: a mixed methods study , 2014, International Journal for Equity in Health.

[16]  Jason Leigh,et al.  Indigenous Protocol and Artificial Intelligence Position Paper , 2020 .

[17]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[18]  C. Cieri,et al.  Evaluating phonemic transcription of low-resource tonal languages for language documentation , 2018 .

[19]  Fachinger,et al.  Colonial Violence in Sixties Scoop Narratives: From In Search of April Raintree to A Matter of Conscience , 2019, Studies in American Indian Literatures.

[20]  Chris Dyer,et al.  PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors , 2016, COLING.

[21]  Olivia Sammons Nominal Classification in Michif , 2019 .

[22]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[23]  J. V. Rauff,et al.  Finite State Morphology , 2007 .

[24]  Alain Désilets,et al.  WeBiText: Building Large Heterogeneous Translation Memories from Parallel Web Content , 2008, TC.

[25]  Kay Richardson,et al.  Researching Language , 2018 .

[26]  Vishwa Gupta,et al.  Automatic Transcription Challenges for Inuktitut, a Low-Resource Polysynthetic Language , 2020, LREC.