Reading News with Maps: The Power of Searching with Spatial Synonyms

The NewsStand system is an example application of a general framework that we are developing to enable people to search for information using a map query interface, where the information results from monitoring the output of over 8,000 RSS news sources and is available for retrieval within minutes of publication. The advantage of doing so is that a map, coupled with an ability to vary the zoom level at which it is viewed, provides an inherent granularity to the search process that facilitates an approximate search. This distinguishes it from today’s prevalent keyword-based conventional search methods that provide a very limited facility for approximate searches which are realized primarily by permitting a match via use of a subset of the keywords. However, it is often the case that users do not have a firm grasp of which keyword to use, and thus would welcome the capability for the search to also take synonyms into account. In the case of queries to spatially-referenced data, the map query interface is a step in this direction as the act of pointing at a location (e.g., by the appropriate positioning of a pointing device) and making the interpretation of the precision of this positioning specification dependent on the zoom level is equivalent to permitting the use of spatial synonyms. The issues that arise in the design of such a system including the identification of words that correspond to geographic locations are discussed, and examples are provided of the utility of the approach, thereby representing a step forward in the emerging field of computational journalism.

[1]  Xing Xie,et al.  Detecting geographic locations from web resources , 2005, GIR '05.

[2]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[3]  W. Francis A Standard Corpus of Edited Present-Day American English , 1965 .

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Paolo Rosso,et al.  A conceptual density‐based approach for the disambiguation of toponyms , 2008, Int. J. Geogr. Inf. Sci..

[6]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[7]  Raphael Volz,et al.  Towards Ontology-based Disambiguation of Geographical Identifiers , 2007, I3.

[8]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[9]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[10]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[11]  Luis Gravano,et al.  Exploiting Geographical Location Information of Web Pages , 1999, WebDB.

[12]  Avi Arampatzis,et al.  The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet , 2007, Int. J. Geogr. Inf. Sci..

[13]  Yannick Versley,et al.  Extracting spatial information : grounding , classifying and linking spatial expressions [ Extended Abstract ] , 2022 .

[14]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[15]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[16]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[17]  Julien Lesbegueries,et al.  A Semantic Approach for Geospatial Information Extraction from Unstructured Documents , 2007, The Geospatial Web.

[18]  Hanan Samet,et al.  STEWARD: architecture of a spatio-textual search engine , 2007, GIS.

[19]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[20]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[21]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[22]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[23]  Steven Skiena,et al.  Spatial Analysis of News Sources , 2006, IEEE Transactions on Visualization and Computer Graphics.

[24]  Kevin S. McCurley,et al.  Geospatial mapping and navigation of the web , 2001, WWW '01.

[25]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[26]  Bernhard Seeger,et al.  Exploiting the Internet As a Geospatial Database , 2003 .

[27]  Hanan Samet,et al.  Augmenting spatio-textual search with an infectious disease ontology , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[28]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[29]  Paul D. Clough Extracting metadata for spatially-aware information retrieval on the internet , 2005, GIR '05.

[30]  Yi Li,et al.  An empirical study of the effects of NLP components on Geographic IR performance , 2008, Int. J. Geogr. Inf. Sci..

[31]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[32]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[33]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[34]  Sangmin Oh,et al.  Augmenting Aerial Earth Maps with dynamic information , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[35]  Xing Xie,et al.  Web resource geographic location classification and detection , 2005, WWW '05.

[36]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[37]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[38]  José Borbinha,et al.  A geo-temporal information extraction service for processing descriptive metadata in digital libraries , 2009 .

[39]  Walid G. Aref,et al.  Efficient processing of window queries in the pyramid data structure , 1990, PODS '90.

[40]  Bruno Pouliquen,et al.  Geocoding Multilingual Texts: Recognition, Disambiguation and Visualisation , 2006, LREC.

[41]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .