Spatial Information Recognition in Web Documents Using a Semi-supervised Machine Learning Method

Web documents are a promising source of spatial information. With information recognition and extraction, this information can be used in various applications such as building semantic maps and indoor robotic navigation. In this paper, we present a novel methodology to identify spatial information in web documents using semi-supervised trained machine learning classifiers. The semi-supervised models trained with the half amount of data available yield only the F-score of 4% and 9% inferior to the supervised models trained with complete data on classifying spatial entities and relationships respectively.

[1]  Naoaki Okazaki,et al.  Named entity recognition with multiple segment representations , 2013, Inf. Process. Manag..

[2]  Inderjeet Mani,et al.  SpatialML: annotation scheme, resources, and evaluation , 2010, Lang. Resour. Evaluation.

[3]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[4]  Gordon Wyeth,et al.  Reasoning about natural language phrases for semantic goal driven exploration , 2015, ICRA 2015.

[5]  Roberto Basili,et al.  UNITOR-HMM-TK: Structured Kernel-based learning for Spatial Role Labeling , 2013, SemEval@NAACL-HLT.

[6]  Richi Nayak,et al.  Finding Within-Organisation Spatial Information on the Web , 2015, Australasian Conference on Artificial Intelligence.

[7]  L. M. Nithya,et al.  A Survey on Semi-Supervised Learning Techniques , 2014, ArXiv.

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[10]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[11]  Sanda M. Harabagiu,et al.  UTD-SpRL: A Joint Approach to Spatial Role Labeling , 2012, SemEval@NAACL-HLT.

[12]  Maksim Tkatchenko,et al.  Named entity recognition: Exploring features , 2012, KONVENS.

[13]  Matthew R. Walter,et al.  A framework for learning semantic maps from grounded natural language descriptions , 2014, Int. J. Robotics Res..