The Lacunae of Danish Natural Language Processing

Danish is a North Germanic language spoken principally in Denmark, a country with a long tradition of technological and scientific innovation. However, the language has received relatively little attention from a technological perspective. In this paper, we review Natural Language Processing (NLP) research, digital resources and tools which have been developed for Danish. We find that availability of models and tools is limited, which calls for work that lifts Danish NLP a step closer to the privileged languages. Dansk abstrakt: Dansk er et nordgermansk sprog, talt primært i kongeriget Danmark, et land med stærk tradition for teknologisk og videnskabelig innovation. Det danske sprog har imidlertid været genstand for relativt begrænset opmærksomhed, teknologisk set. I denne artikel gennemgår vi sprogteknologi-forskning, -ressourcer og -værktøjer udviklet for dansk. Vi konkluderer at der eksisterer et fåtal af modeller og værktøjer, hvilket indbyder til forskning som løfter dansk sprogteknologi i niveau med mere priviligerede sprog.

[1]  Sussi Olsen,et al.  Supersense tagging for Danish , 2015, NODALIDA.

[2]  Kostas Pantazos,et al.  De-identifying an EHR Database - Anonymity, Correctness and Readability of the Medical Record , 2011, MIE.

[3]  H. Cunningham,et al.  Developing Language Processing Components with GATE , 2001 .

[4]  Alexandra Balahur,et al.  Sentiment analysis meets social media - Challenges and solutions of the field in view of the current information sharing context , 2015, Inf. Process. Manag..

[5]  Robert Eriksson,et al.  Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text , 2013, J. Am. Medical Informatics Assoc..

[6]  Barbara Plank,et al.  Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss , 2016, ACL.

[7]  Aage Hansen Stødet i dansk , 1943 .

[8]  Andy Way,et al.  Getting Gender Right in Neural Machine Translation , 2019, EMNLP.

[9]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[10]  Finn Årup Nielsen,et al.  A New ANEW: Evaluation of a Word List for Sentiment Analysis in Microblogs , 2011, #MSM.

[11]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[12]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[13]  Leon Strømberg Derczynski,et al.  Bornholmsk Natural Language Processing: Resources and Tools , 2019, Nordic Conference of Computational Linguistics.

[14]  Eckhard Bick A Named Entity Recognizer for Danish , 2004, LREC.

[15]  Survey of POS taggers , 2015 .

[16]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[17]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[18]  Leon Derczynski,et al.  DKIE: Open Source Information Extraction for Danish , 2014, EACL.

[19]  Stefan M. Rüger,et al.  Adverse Drug Reaction Classification With Deep Neural Networks , 2016, COLING.

[20]  Xu Wang,et al.  A comparative study for biomedical named entity recognition , 2015, International Journal of Machine Learning and Cybernetics.

[21]  Pierre Zweigenbaum,et al.  Clinical Natural Language Processing in languages other than English: opportunities and challenges , 2018, Journal of Biomedical Semantics.

[22]  Chris Callison-Burch,et al.  Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription , 2010, NAACL.

[23]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[24]  Boris Ginsburg,et al.  Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq , 2018, 1805.10387.

[25]  Hans Basbøll,et al.  The phonology of Danish , 2005 .

[26]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[27]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[28]  Héctor Martínez Alonso,et al.  Universal Dependencies for Danish , 2015 .

[29]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[30]  James Pustejovsky,et al.  SemEval-2017 Task 12: Clinical TempEval , 2017, *SEMEVAL.

[31]  Eckhard Bick,et al.  Named Entity Recognition for the Mainland Scandinavian Languages , 2005, Lit. Linguistic Comput..

[32]  Dirk Hovy,et al.  Robust Cross-Domain Sentiment Analysis for Low-Resource Languages , 2014, WASSA@ACL.

[33]  Satoshi Nakamura,et al.  Optimizing Computer-Assisted Transcription Quality with Iterative User Interfaces , 2016, LREC.

[34]  Britt Keson,et al.  The Construction of a Tagged Danish Corpus , 1998, NODALIDA.

[35]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[36]  Leon Derczynski Simple Natural Language Processing Tools for Danish , 2019, ArXiv.

[37]  Claire Cardie,et al.  Nested Named Entity Recognition Revisited , 2018, NAACL.

[38]  Bing Liu Sentiment Analysis and Opinion Mining Opinion Mining , 2011 .

[39]  Gonçalo Simões,et al.  Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings , 2018, ACL.

[40]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.