Named Entity Recognition in Turkish with Bayesian Learning and Hybrid Approaches

Named entity recognition is one of the significant textual information extraction tasks. In this paper, we present two approaches for named entity recognition on Turkish texts. The first is a Bayesian learning approach which is trained on a considerably limited training set. The second approach comprises two hybrid systems based on joint utilization of this Bayesian learning approach and a previously proposed rule-based named entity recognizer. All of the proposed three approaches achieve promising performance rates. This paper is significant as it reports the first use of the Bayesian approach for the task of named entity recognition on Turkish texts for which especially practical approaches are still insufficient.

[1]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[2]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[3]  Ralph Grishman,et al.  Information Extraction: Techniques and Challenges , 1997, SCIE.

[4]  Ilyas Cicekli,et al.  Automatic rule learning exploiting morphological features for named entity recognition in Turkish , 2011, J. Inf. Sci..

[5]  Dayne Freitag,et al.  Boosted Wrapper Induction , 2000, AAAI/IAAI.

[6]  Rıfat Ilgaz,et al.  Bacaksız : tatil köyünde , 1994 .

[7]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[8]  Adnan Yazici,et al.  Employing named entities for semantic retrieval of news videos in Turkish , 2009, 2009 24th International Symposium on Computer and Information Sciences.

[9]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[10]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[11]  Christine D. Piatko,et al.  Named Entity Recognition using Hundreds of Thousands of Features , 2003, CoNLL.

[12]  Kalina Bontcheva,et al.  Adapting SVM for data sparseness and imbalance: a case study in information extraction , 2009, Natural Language Engineering.

[13]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[14]  Adnan Yazici,et al.  A hybrid named entity recognizer for Turkish , 2012, Expert Syst. Appl..

[15]  Adnan Yazici,et al.  Named Entity Recognition Experiments on Turkish Texts , 2009, FQAS.

[16]  Adnan Yazici,et al.  Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos , 2011, Knowl. Based Syst..

[17]  A. Yazici,et al.  Identification of coreferential chains in video texts for semantic annotation of news videos , 2008, 2008 23rd International Symposium on Computer and Information Sciences.

[18]  Rıfat Ilgaz,et al.  Bacaksız : kamyon sürücüsü , 1994 .

[19]  Banu Diri,et al.  Named Entity Recognition by Conditional Random Fields from Turkish informal texts , 2011, 2011 IEEE 19th Signal Processing and Communications Applications Conference (SIU).

[20]  Yorick Wilks,et al.  Named Entity Recognition from Diverse Text Types , 2001 .

[21]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[22]  Gökhan Tür,et al.  A statistical information extraction system for Turkish , 2003, Natural Language Engineering.

[23]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[24]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[25]  Reyyan Yeniterzi Exploiting Morphology in Turkish Named Entity Recognition System , 2011, ACL.

[26]  John Shawe-Taylor,et al.  The SVM With Uneven Margins and Chinese Document Categorization , 2003, PACLIC.

[27]  Alicia Ageno,et al.  Adaptive information extraction , 2006, CSUR.

[28]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[29]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.