Classifying Wikipedia entities into fine-grained classes

Recognition of named entities (people, companies, locations, etc) is an essential task of text analytics. We address the subproblem of this task, namely, named entity classification. We propose a novel approach that constructs an effective fine-grained named entity classifier. Its key highlights are semi-automatic training set construction from Wikipedia articles and additional feature selection. We justify our solution by creating 18-class classifier and demonstrating its effectiveness and efficiency.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Wisam Dakka,et al.  Augmenting Wikipedia with Named Entity Tags , 2008, IJCNLP.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  Nancy Chinchor,et al.  Appendix E: MUC-7 Named Entity Task Definition (version 3.5) , 1998, MUC.

[5]  Kjetil Nørvåg,et al.  Extracting Named Entities and Synonyms from Wikipedia , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[6]  Kareem Darwish,et al.  Classifying Wikipedia Articles into NE's Using SVM's with Threshold Adjustment , 2010, NEWS@ACL.

[7]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[10]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[11]  James R. Curran,et al.  Improved Text Categorisation for Wikipedia Named Entities , 2009, ALTA.

[12]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[13]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[14]  Dunja Mladenic,et al.  Extracting Named Entities and Relating Them over Time Based on Wikipedia , 2007, Informatica.

[15]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[16]  Yuji Matsumoto,et al.  A Graph-Based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields , 2007, EMNLP.

[17]  Michael Strube,et al.  Distinguishing between Instances and Classes in the Wikipedia Taxonomy , 2008, ESWC.

[18]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[19]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[20]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[21]  Stan Matwin,et al.  Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity , 2006, Canadian AI.

[22]  Antonio Toral,et al.  Named Entity WordNet , 2008, LREC.

[23]  Joel Nothman,et al.  Analysing Wikipedia and Gold-Standard Corpora for NER Training , 2009, EACL.

[24]  Antonio Toral,et al.  A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia , 2006, Workshop On New Text Wikis And Blogs And Other Dynamic Text Sources.