Supporting rapid processing and interactive map-based exploration of streaming news

The database architecture and system design of NewsStand, a database system that analyzes and displays streaming news using a map user interface, is described. Special emphasis is given to NewsStand's pipe server, which coordinates individual, independent analysis modules in a processing pipeline, and NewsStand's relational database schema, designed to accommodate responsive spatial querying and retrieval via NewsStand's user interface. Examples of these spatial queries, which are variants of top-k window queries, are also presented. Experiments on the live NewsStand database system demonstrate its capability for rapidly processing large amounts of streaming news as well as the interactivity of its map user interface as measured by database querying.

[1]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[2]  Hanan Samet,et al.  Images in News , 2010, 2010 20th International Conference on Pattern Recognition.

[3]  Hanan Samet,et al.  Multifaceted toponym recognition for streaming news , 2011, SIGIR.

[4]  Hanan Samet,et al.  Determining the spatial reader scopes of news sources using local lexicons , 2010, GIS '10.

[5]  Hanan Samet,et al.  Online Document Clustering Using the GPU , 2014 .

[6]  Hanan Samet,et al.  Use of the SAND spatial browser for digital government applications , 2003, CACM.

[7]  Daniel A. Keim,et al.  Processing online news streams for large-scale semantic analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[8]  Clifford A. Shaffer,et al.  QUILT: a geographic information system based on quadtrees , 1990, Int. J. Geogr. Inf. Sci..

[9]  Bruno Martins,et al.  A Machine Learning Approach for Resolving Place References in Text , 2010, AGILE Conf..

[10]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[11]  Jochen L. Leidner An evaluation dataset for the toponym resolution task , 2006, Comput. Environ. Urban Syst..

[12]  Monika Henzinger,et al.  Query-free news search , 2003, WWW.

[13]  Azriel Rosenfeld,et al.  A geographic information system using quadtrees , 1984, Pattern Recognit..

[14]  Carsten Griwodz,et al.  Workload Characterization for News-on-Demand Streaming Services , 2007, 2007 IEEE International Performance, Computing, and Communications Conference.

[15]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[16]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[17]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[18]  Hanan Samet,et al.  Adapting a map query interface for a gesturing touch screen interface , 2011, WWW.

[19]  Hanan Samet,et al.  Porting a web-based mapping application to a smartphone app , 2011, GIS.

[20]  Hanan Samet,et al.  Adaptive context features for toponym resolution in streaming news , 2012, SIGIR '12.

[21]  Hanan Samet,et al.  STEWARD: architecture of a spatio-textual search engine , 2007, GIS.

[22]  Walid G. Aref,et al.  Efficient processing of window queries in the pyramid data structure , 1990, PODS '90.

[23]  Grigorios Tsoumakas,et al.  PersoNews: A Personalized News Reader Enhanced by Machine Learning and Semantic Filtering , 2006, OTM Conferences.

[24]  Hanan Samet,et al.  Geotagging: using proximity, sibling, and prominence clues to understand comma groups , 2010, GIR.

[25]  Monika Henzinger,et al.  Query-Free News Search , 2003, WWW '03.