Morphosyntactic Tagging of Old Icelandic Texts and Its Use in Studying Syntactic Variation and Change

We describe experiments with morphosyntactic tagging of Old Icelandic (Old Norse) narrative texts using different tagging models for the TnT tagger [3] and a tagset of almost 700 tags, originally developed for Modern Icelandic. It is shown that by using a model that has been trained on both Old and Modern Icelandic texts, we can get 92.7% tagging accuracy which is considerably better than the 90.4% that have been reported for Modern Icelandic. Although our tagging is morphological in nature, the tags carry a substantial amount of syntactic information and the tagging is detailed enough for the syntactic function of words to be more or less deduced from their morphology and the adjacent words. We show that the morphosyntactic tags can be very useful in locating certain syntactic constructions and features in a large corpus of Old Icelandic narrative texts. We demonstrate this by searching for—and finding—previously undiscovered examples of a number of syntactic constructions in the corpus.We conclude that in a highly inflectional language, a morphologically tagged corpus can be an important tool in studying syntactic variation and change, in the absence of a fully parsed corpus which of course gives more possibilities.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Jens Haugan Old Norse Word Order and Information Structure , 2000 .

[3]  E. Rögnvaldsson,et al.  The corpus of spoken icelandic and its morphosyntactic annotation , 2006 .

[4]  Janne Bondi Johannessen,et al.  Glossa: a Multilingual, Multimodal, Configurable User Interface , 2008, LREC.

[5]  Jan Terje Faarlund,et al.  The syntax of Old Norse , 2004 .

[6]  J. T. Faarlund Syntactic change : toward a theory of historical syntax , 1990 .

[7]  Eiríkur Rögnvaldsson,et al.  The Icelandic Parsed Historical Corpus (IcePaHC) , 2012, LREC.

[8]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.

[9]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[10]  Ans van Kemenade,et al.  Verb-Object Order in Early Middle English , 2000 .

[11]  John D. Sundquist Object Shift and Holmberg’s Generalization in the History of Norwegian , 2002 .

[12]  Eiríkur Rögnvaldsson,et al.  Improving the PoS tagging accuracy of Icelandic text , 2009, NODALIDA.

[13]  Eiríkur Rögnvaldsson,et al.  Coping with Variation in the Icelandic Diachronic Treebank , 2011 .

[14]  David Lightfoot,et al.  Syntactic effects of morphological change , 2002 .

[15]  Susan C. Herring,et al.  Textual parameters in older languages , 2001 .

[16]  Höskuldur Thráinsson,et al.  The Syntax of Icelandic , 2007 .

[17]  Höskuldur Thráinsson,et al.  Word Order and Syntactic Features in the Scandinavian Languages and English , 1989, Nordic Journal of Linguistics.

[18]  George Tsoulas,et al.  Diachronic syntax : models and mechanisms , 2000 .

[19]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[20]  Eiríkur Rögnvaldsson,et al.  Word order variation in the VP in Old Icelandic , 1996 .

[21]  Eiríkur Rögnvaldsson Old Icelandic: A Non-Configurational Language? , 1995 .