A set of open source tools for Turkish natural language processing

This paper introduces a set of freely available, open-source tools for Turkish that are built around TRmorph, a morphological analyzer introduced earlier in Çöltekin (2010a). The article first provides an update on the analyzer, which includes a complete rewrite using a different finite-state description language and tool set as well as major tagset changes to comply better with the state-of-the-art computational processing of Turkish and the user requests received so far. Besides these major changes to the analyzer, this paper introduces tools for morphological segmentation, stemming and lemmatization, guessing unknown words, grapheme to phoneme conversion, hyphenation and a morphological disambiguation.

[1]  Cem Bozşahin,et al.  Semi-supervised morpheme segmentation without morphological analysis , 2012 .

[2]  Gülşen Eryiğit,et al.  Redefinition of Turkish Morphology Using Flag Diacritics , 2013 .

[3]  Deniz Yuret,et al.  Learning Morphological Disambiguation Rules for Turkish , 2006, NAACL.

[4]  Halit Oguztüzün,et al.  Semantic Expansion of Tweet Contents for Enhanced Event Detection in Twitter , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[5]  Francis M. Tyers,et al.  A finite-state morphological transducer for Kyrgyz , 2012, LREC.

[6]  B. MacWhinney,et al.  The Child Language Data Exchange System: an update , 1990, Journal of Child Language.

[7]  Meltem Turhan Yöndem,et al.  Two Alternate Methods for Information Retrieval from Turkish Radiology Reports , 2011, ISCIS.

[8]  Kemal Oflazer,et al.  Dependency Parsing of Turkish , 2008, CL.

[9]  Cem Bozsahin,et al.  Syllables, Morphemes and Bayesian Computational Models of Acquiring a Word Grammar , 2007 .

[10]  Çagri Çöltekin Improving Successor Variety for Morphological Segmentation , 2010 .

[11]  Mans Hulden,et al.  Foma: a Finite-State Compiler and Library , 2009, EACL.

[12]  Maciej Janicki Unsupervised Learning of A-Morphous Inflection with Graph Clustering , 2013, RANLP.

[13]  Kemal Oflazer,et al.  Tagging and Morphological Disambiguation of Turkish Text , 1994, ANLP.

[14]  Tommi A. Pirinen,et al.  Using HFST for Creating Computational Linguistic Applications , 2013, Computational Linguistics - Applications.

[15]  Gökhan Tür,et al.  Morphological Disambiguation by Voting Constraints , 1997, ACL.

[16]  John Nerbonne,et al.  An explicit statistical model of learning lexical segmentation using multiple cues , 2014, EACL 2014.

[17]  Murat Saraclar,et al.  Morphological Disambiguation of Turkish Text with Perceptron Algorithm , 2009, CICLing.

[18]  Kemal Oflazer,et al.  The architecture and the implementation of a finite state pronunciation lexicon for Turkish , 2006, Comput. Speech Lang..

[19]  Tommi A. Pirinen,et al.  HFST Tools for Morphology - An Efficient Open-Source Package for Construction of Morphological Analyzers , 2009, SFCM.

[20]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[21]  Çağrı Çöltekin,et al.  A Freely Available Morphological Analyzer for Turkish , 2010, LREC.

[22]  Mathieu Avanzi,et al.  Mapping to prosody: Not all parentheticals are alike , 2015 .

[23]  Bakyt M. Kairakbay,et al.  Finite State Approach to the Kazakh Nominal Paradigm , 2013, FSMNLP.

[24]  A. Göksel,et al.  Turkish: A Comprehensive Grammar , 2004 .

[25]  Francis M. Tyers,et al.  The Apertium machine translation platform: five years on , 2009 .

[26]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[27]  Gökhan Tür,et al.  Statistical Morphological Disambiguation for Agglutinative Languages , 2000, COLING.

[28]  Helmut Schmid,et al.  A Programming Language for Finite State Transducers , 2005, FSMNLP.