A. Pine, M. Turin Seeing the Heiltsuk Orthography from Font Encoding through to Unicode Proceedings of the LREC 2018 Workshop “CCURL 2018 – Sustaining Knowledge Diversity in the Digital Age”

Across the world’s languages and cultures, most writing systems predate the use of computers. In the early years of ICT, standards and protocols for encoding and rendering the majority of the world’s writing systems were not in place. The opportunity to deploy lesscommonly used orthographies in cross-platform digital contexts has steadily increased since Unicode became the most widely used encoding on the web in late 2007 (Davis, 2008). But what happens to resources that were developed before Unicode standards became widespread? While many tools have been created to address this problem and other issues related to transliteration and character level substitutions,1 this paper describes the process undertaken for the Indigenous and endangered Heiltsuk (Wakashan) language, and outlines a tool (Convertextract) that was designed to convert not only plain text, but also Microsoft Office (pptx, xlsx, docx) documents with the goals of updating and upgrading pre-existing digital textual resources to Unicode standards, and thus preserving the knowledge they contain for both the present and the future.