A case study of using geographic cues to predict query news intent

Geographic information retrieval encompasses important tasks including finding the location of a user, and locations relevant to their search queries. Web-based search engines receive queries from numerous users located in very different parts of the world. A typical way for people to find news is through a general web search engine, which makes it important for search engines to recognize queries with news intent. An important question for geographic information retrieval is how we can benefit from geographic cues to predict the intent of users. This work presents a case study of an application using geographic features to improve the quality of an important web search task, involving predicting which queries have news intent and hence are likely to receive clicks on news search results. Our case study suggests that information derived from geographic features can help the task. The information we consider includes cues derived from the location of the user, from the IP address, the location relevant to the query, automatically extracted from the query string, and the relation between the two locations. We build a classifier that uses geographical cues to predict whether a query will result in a news click or not. We compare our classifier to a strong baseline that use non-geographic click-based features and we show that our classifier outperforms the baseline for geographic queries.

[1]  Fernando Diaz,et al.  Integration of news content into web results , 2009, WSDM '09.

[2]  Wei Vivian Zhang,et al.  Geographic intention and modification in web search , 2008, Int. J. Geogr. Inf. Sci..

[3]  C. Lee Giles,et al.  Modeling and visualizing geo-sensitive queries based on user clicks , 2008, LocWeb.

[4]  Jiahui Liu,et al.  LocalSavvy: aggregating local points of view about news issues , 2008, LocWeb.

[5]  Avi Arampatzis,et al.  The design and implementation of SPIRIT: a spatially aware search engine for information retrieval on the Internet , 2007, Int. J. Geogr. Inf. Sci..

[6]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[7]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[8]  J. Friedman 1999 REITZ LECTURE GREEDY FUNCTION APPROXIMATION: A GRADIENT BOOSTING MACHINE' , 2001 .

[9]  Steven Skiena,et al.  Spatial Analysis of News Sources , 2006, IEEE Transactions on Visualization and Computer Graphics.

[10]  Yi Li,et al.  NICTA I2D2 Group at GeoCLEF 2006 , 2006, CLEF.

[11]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .

[12]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[13]  Torsten Suel,et al.  Analysis of geographic queries in a search engine log , 2008, LocWeb.

[15]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[16]  Avi Arampatzis,et al.  Multi-Dimensional Scattered Ranking Methods for Geographic Information Retrieval* , 2005, GeoInformatica.

[17]  Hanan Samet,et al.  STEWARD: architecture of a spatio-textual search engine , 2007, GIS.

[18]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[19]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[20]  M. Friedman Greedy Fun tion Approximation : A Gradient Boosting , 1999 .

[21]  Mong-Li Lee,et al.  Discovering geographical-specific interests from web click data , 2008, LocWeb.

[22]  Jon M. Kleinberg,et al.  Spatial variation in search engine queries , 2008, WWW.

[23]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[24]  J. Friedman Stochastic gradient boosting , 2002 .

[25]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[26]  Walter Christaller Die zentralen Orte in Süddeutschland , 1980 .