GeoMantis: Inferring the Geographic Focus of Text using Knowledge Bases

We consider the problem of identifying the geographic focus of a document. Unlike some previous work on this problem, we do not expect the document to explicitly mention the target region, making our problem one of inference or prediction, rather than one of identification. Further, we seek to tackle the problem without appealing to specialized geographic information resources like gazetteers or atlases, but employ general-purpose knowledge bases and ontologies like ConceptNet and YAGO. We propose certain natural strategies towards addressing the problem, and show that the GeoMantis system that implements these strategies outperforms an existing state-of-the-art system, when compared on documents whose target region (country, in particular) is not explicitly mentioned or is obscured. Our results give evidence that using general-purpose knowledge bases and ontologies can, in certain cases, outperform even specialized tools.

[1]  Matteo Cristani,et al.  A Multimodal Approach to Relevance and Pertinence of Documents , 2016, IEA/AIE.

[2]  Michael Günther,et al.  Introducing Wikidata to the Linked Data Web , 2014, SEMWEB.

[3]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[4]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[5]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[6]  Avi Arampatzis,et al.  The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet , 2007, Int. J. Geogr. Inf. Sci..

[7]  Clodoveu A. Davis,et al.  A survey on the geographic scope of textual documents , 2016, Comput. Geosci..

[8]  Daniele P. Radicioni,et al.  From human to artificial cognition and back: New perspectives on cognitively inspired AI systems , 2015, Cognitive Systems Research.

[9]  G. Bower Experiments on Story Understanding and Recall * , 1976 .

[10]  Jochen L. Leidner,et al.  Detecting geographical references in the form of place names and associated spatial natural language , 2011, SIGSPACIAL.

[11]  Hanan Samet,et al.  Determining the spatial reader scopes of news sources using local lexicons , 2010, GIS '10.

[12]  Virginia Dignum,et al.  Responsible Autonomy , 2017, IJCAI.

[13]  Barbara Tversky,et al.  Cognitive Maps, Cognitive Collages, and Spatial Mental Models , 1993, COSIT.

[14]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[15]  Clodoveu A. Davis,et al.  Geotagging Aided by Topic Detection with Wikipedia , 2011, AGILE Conf..

[16]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[17]  Loizos Michael,et al.  A Hybrid Approach to Commonsense Knowledge Acquisition , 2016, STAIRS.

[18]  Ethan Zuckerman,et al.  CLIFF-CLAVIN : Determining Geographic Focus for News Articles [ Extended Abstract ] , 2014 .

[19]  Gosse Bouma,et al.  Every document has a geographical scope , 2012, Data Knowl. Eng..

[20]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[21]  Pablo de la Fuente,et al.  Extracting Geographic Context from the Web: GeoReferencing in MyMoSe , 2009, ECIR.

[22]  Allison Woodruff,et al.  GIPSY: Georeferenced Information Processing SYstem , 1994 .

[23]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[24]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[25]  Bruno Martins,et al.  Automated Geocoding of Textual Documents: A Survey of Current Approaches , 2017, Trans. GIS.

[26]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[27]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[28]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[29]  Davood Rafiei,et al.  Geotagging Named Entities in News and Online Documents , 2016, CIKM.

[30]  Stellan Ohlsson,et al.  Verbal IQ of a Four-Year Old Achieved by an AI System , 2013, AAAI.

[31]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[32]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[33]  Catherine Havasi,et al.  ConceptNet 5: A Large Semantic Network for Relational Knowledge , 2013, The People's Web Meets NLP.