Development of semi-supervised named entity recognition to discover new tourism places

Tourism information needs are increasing in line with tourism that has been a primary need for some people. This has an impact on the growth of the tourism information provider. The amount of available information sometimes makes tourist confuse to the information that they needed. Currently, the search systems only rely on indexing web pages so that the information obtained by the tourist is still unfavorable because it only shows a web page with keywords that exist on the article. A support system to recognize tourism places on the web pages is required to produce better information presentation. In this study, the recognition system based on Yet Another Two Stage Idea (YATSI) Semi-Supervised Learning with the Naïve Bayes classifier is used to address the problem. Results obtained by classifying candidate entities on a hundred web pages demonstrate 74% precision with 70% recall.

[1]  T. V. Geetha,et al.  Semi-supervised Bootstrapping approach for Named Entity Recognition , 2015, ArXiv.

[2]  Kurt Driessens,et al.  Using Weighted Nearest Neighbor to Benefit from Unlabeled Data , 2006, PAKDD.

[3]  Ralph Grishman,et al.  A Decision Tree Method for Finding and Classifying Names in Japanese Texts , 1998, VLC@COLING/ACL.

[4]  José Luis Borbinha,et al.  An Approach for Named Entity Recognition in Poorly Structured Data , 2012, ESWC.

[5]  Junling Hu,et al.  Bootstrapped Named Entity Recognition for Product Attribute Extraction , 2011, EMNLP.

[6]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[7]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[8]  Yuji Matsumoto,et al.  Japanese Named Entity Extraction with Redundant Morphological Analysis , 2003, NAACL.

[9]  Reiner Kraft,et al.  A scalable machine-learning approach for semi-structured named entity recognition , 2010, WWW '10.

[10]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[11]  David Nadeau,et al.  Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision , 2007 .

[12]  Ellen Riloff Bootstrapping for text learning tasks , 1999 .

[13]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[14]  Sriharsha Veeramachaneni,et al.  A Simple Semi-supervised Algorithm For Named Entity Recognition , 2009, HLT-NAACL 2009.

[15]  H. Cunningham,et al.  Developing Language Processing Components with GATE , 2001 .

[16]  Kalina Bontcheva,et al.  Developing Language Processing Components with GATE (a User Guide) , 2003 .

[17]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[18]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[19]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.