The IPP effect in Afrikaans: a corpus analysis

Compared to well-resourced languages such as English and Dutch, NLP tools for linguistic analysis in Afrikaans are still not abundant. In order to facilitate corpus-based linguistic research for Afrikaans, we are creating a treebank based on the Taalkommissie corpus. We adapted a tokenizer and a shallow parser, while using a TnT tagger to do part-of-speech annotation. A first linguistic phenomenon we are investigating is the occurrence of infinitivus pro participio (IPP) in Afrikaans. IPP refers to constructions with a perfect auxiliary, in which an infinitive appears instead of the expected past participle. The phenomenon has been studied extensively in Dutch and German, but studies on Afrikaans IPP triggers are sparse. In contrast to the former two languages, it is often mentioned in the literature that in Afrikaans, IPP occurs optionally. We want to check this statement doing a corpus analysis.

[1]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[2]  Martin Johannes Puttkammer Outomatiese Afrikaanse tekseenheididentifisering , 2006 .

[3]  Marthinus W. Pretorius,et al.  A technology audit: The state of human language technologies (HLT) R&D in South Africa , 2011, Portland International Conference on Management of Engineering and Technology.

[4]  Catharina Adriana Breed,et al.  Die grammatikalisering van aspek in Afrikaans : 'n semantiese studie van perifrastiese progressiewe konstruksies , 2012 .

[5]  Dudenredaktion Duden, die Grammatik : unentbehrlich für richtiges Deutsch , 2005 .

[6]  W.J.M. Haeseryn Algemene Nederlandse spraakkunst , 1997 .

[7]  Bruce C. Donaldson A Grammar of Afrikaans , 1993 .

[8]  Peter Dirix,et al.  METISII: Example-based Machine Translation Using Monolingual CorporaSystem Description , 2005, MTSUMMIT.

[9]  Mark de Vos,et al.  Afrikaans verb clusters. A functional-head analysis , 2001 .

[10]  Jacques Van Keymeulen,et al.  Grammatica van het Afrikaans , 2013 .

[11]  Liesbeth Augustinus,et al.  Example-Based Treebank Querying , 2012, LREC.

[12]  Suléne Pilon Outomatiese Afrikaanse woordsoortetikettering , 2005 .

[13]  C.J.W. Zwart,et al.  Some notes on the origin and distribution of the IPP-effect , 2007 .

[14]  Vincent Vandeghinste,et al.  A Hybrid Modular Machine Translation System , 2008 .

[15]  Georg I. Schlünz,et al.  The effects of part–of–speech tagging on text–to–speech synthesis for resource–scarce languages , 2010 .

[16]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.