Initial Explorations on using CRFs for Turkish Named Entity Recognition

This paper reports the highest results (95% in MUC and 92% in CoNLL metric) in the literature for Turkish named entity recognition; more specifically for the task of detecting person, location and organization entities in general news texts. We give an in depth analysis of the previous reported results and make comparisons with them whenever possible. We use conditional random fields (CRFs) as our statistical model. The paper presents initial explorations on the usage of rich morphological structure of the Turkish language as features to CRFs together with the use of some basic and generative gazetteers.

[1]  Min Zhang,et al.  Whitepaper of NEWS 2012 Shared Task on Machine Transliteration , 2011, NEWS@ACL.

[2]  OflazerKemal,et al.  A statistical information extraction system for Turkish , 2003 .

[3]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[4]  Adnan Yazici,et al.  A hybrid named entity recognizer for Turkish , 2012, Expert Syst. Appl..

[5]  Adnan Yazici,et al.  Named Entity Recognition Experiments on Turkish Texts , 2009, FQAS.

[6]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[7]  Gülsen Eryigit The Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish , 2012, LREC.

[8]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[9]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[10]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[11]  Kemal Oflazer Two-level description of Turkish morphology , 1993 .

[12]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[13]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[14]  Reyyan Yeniterzi Exploiting Morphology in Turkish Named Entity Recognition System , 2011, ACL.

[15]  Hayssam N. Traboulsi,et al.  Named entity recognition : a local grammar-based approach , 2006 .

[16]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[17]  Vincent Ng,et al.  Learning-Based Named Entity Recognition for Morphologically-Rich, Resource-Scarce Languages , 2009, EACL.

[18]  Hinrich Schütze,et al.  Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition , 2011, ACL.

[19]  T.T. Temizel,et al.  Person name extraction from Turkish financial news text using local grammar-based approach , 2008, 2008 23rd International Symposium on Computer and Information Sciences.

[20]  Sivaji Bandyopadhyay,et al.  A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi , 2009 .

[21]  Ilyas Cicekli,et al.  Automatic rule learning exploiting morphological features for named entity recognition in Turkish , 2011, J. Inf. Sci..

[22]  Beth M. Sundheim,et al.  Overview of Results of the MUC-6 Evaluation , 1995, MUC.

[23]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[24]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[25]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[26]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[27]  Hwee Tou Ng,et al.  A 2-poisson model for probabilistic coreference of named entities for improved text retrieval , 2009, SIGIR.

[28]  Banu Diri,et al.  Named Entity Recognition by Conditional Random Fields from Turkish informal texts , 2011, 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU).

[29]  Murat Saraclar,et al.  Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus , 2008, GoTAL.

[30]  Kemal Oflazer,et al.  Recall-Oriented Learning of Named Entities in Arabic Wikipedia , 2012, EACL.

[31]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..