Language Technology Programme for Icelandic 2019-2023

In this paper, we describe a new national language technology programme for Icelandic. The programme, which spans a period of five years, aims at making Icelandic usable in communication and interactions in the digital world, by developing accessible, open-source language resources and software. The research and development work within the programme is carried out by a consortium of universities, institutions, and private companies, with a strong emphasis on cooperation between academia and industries. Five core projects will be the main content of the programme: language resources, speech recognition, speech synthesis, machine translation, and spell and grammar checking. We also describe other national language technology programmes and give an overview over the history of language technology in Iceland.

[1]  Neeme Kahusk,et al.  Strategic Importance of Language Technology in Estonia , 2012, Baltic HLT.

[2]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[3]  Sigrún Helgadóttir,et al.  The Tagged Icelandic Corpus (MÍM) , 2012 .

[4]  Lars Borin,et al.  Meta-net White Paper Series , 2011 .

[5]  Inga Rún Helgadóttir,et al.  Building an ASR Corpus Using Althingi's Parliamentary Speeches , 2017, INTERSPEECH.

[6]  Jón Guðnason,et al.  Risamálheild: A Very Large Icelandic Text Corpus , 2018, LREC.

[7]  Hrafn Loftsson,et al.  A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System , 2019, RANLP.

[8]  Eiríkur Rögnvaldsson,et al.  A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI) , 2008, GoTAL.

[9]  Jón Guðnason,et al.  Málrómur: A Manually Verified Corpus of Recorded Icelandic Speech , 2017, NODALIDA.

[10]  Jón Hilmar Jónsson,et al.  Íslenskt orðanet: Et skritt mot en allmennspråklig onomasiologisk ordbok , 2011 .

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Eiríkur Rögnvaldsson,et al.  Language Resources for Icelandic , 2013 .

[13]  Inga R. Helgadóttir,et al.  Lattice Re-Scoring During Manual Editing for Automatic Error Correction of ASR Transcripts , 2019, INTERSPEECH.

[14]  Steinþór Steingrímsson,et al.  Compiling and Filtering ParIce: An English-Icelandic Parallel Corpus , 2019, NODALIDA.

[15]  Griet Depoorter,et al.  Functioning of the Centre for Dutch Language and Speech Technology , 2006, LREC.

[16]  Eiríkur Rögnvaldsson,et al.  Creating a Dual-Purpose Treebank , 2011, J. Lang. Technol. Comput. Linguistics.

[17]  Hrafn Loftsson,et al.  Nefnir: A high accuracy lemmatizer for Icelandic , 2019, NODALIDA.

[19]  Francis M. Tyers,et al.  Apertium-IceNLP: A rule-based Icelandic to English machine translation system , 2011, EAMT.

[20]  Inga Rún Helgadóttir,et al.  Open ASR for Icelandic: Resources and a Baseline System , 2018, LREC.

[21]  Eiríkur Rögnvaldsson,et al.  IceParser: An Incremental Finite-State Parser for Icelandic , 2007, NODALIDA.

[22]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[23]  Eiríkur Rögnvaldsson,et al.  Developing a PoS-tagged corpus using existing tools , 2010 .

[24]  Jón Guðnason,et al.  Eyra - Speech Data Acquisition System for Many Languages , 2016, SLTU.

[25]  Jón Guðnason,et al.  Almannarómur: an open icelandic speech corpus , 2012, SLTU.

[26]  Georg Rehm,et al.  The Icelandic Language in the Digital Age , 2012 .

[27]  Hrafn Loftsson,et al.  Towards High Accuracy Named Entity Recognition for Icelandic , 2019, NODALIDA.

[28]  Steinþór Steingrímsson,et al.  Augmenting a BiLSTM tagger with a Morphological Lexicon and a Lexical Category Identification Step , 2019, RANLP.

[29]  Jan Odijk,et al.  The Dutch-Flemish HLT Programme STEVIN: Essential Speech and Language Technology Resources , 2006, LREC.

[30]  Roberts Rozis,et al.  Tilde MODEL - Multilingual Open Data for EU Languages , 2017, NODALIDA.

[31]  Róbert Kjaran,et al.  The Althingi ASR System , 2019, INTERSPEECH.

[32]  Anna Björk Nikulásdóttir,et al.  Bootstrapping a Text Normalization System for an Inflected Language. Numbers as a Test Case , 2019, INTERSPEECH.

[33]  Eiríkur Rögnvaldsson,et al.  The Icelandic Parsed Historical Corpus (IcePaHC) , 2012, LREC.

[34]  Eiríkur Rögnvaldsson,et al.  IceNLP: a natural language processing toolkit for icelandic , 2007, INTERSPEECH.

[35]  Anna Björk Nikulásdóttir,et al.  An Icelandic Pronunciation Dictionary for TTS , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).

[36]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[37]  昌明 神谷,et al.  初期近代英語に現れる小節・結果構文 Penn-Helsinki Parsed Corpus of Early Modern Englishを検索して , 2011 .

[38]  昌明 神谷,et al.  中英語に現れる小節・結果構文-Penn-Helsinki Parsed Corpus of Middle English Second Editionを検索して- , 2010 .

[39]  Peter Spyns,et al.  The STEVIN Programme: Result of 5 Years Cross-border HLT for Dutch Policy Preparation , 2013, Essential Speech and Language Technology for Dutch.

[40]  Eiríkur Rögnvaldsson,et al.  Context-Sensitive Spelling Correction and Rich Morphology , 2009, NODALIDA.