Analysis of geographic queries in a search engine log

Geography is becoming increasingly important in web search. Search engines can often return better results to users by analyzing features such as user location or geographic terms in web pages and user queries. This is also of great commercial value as it enables location specific advertising and improved search for local businesses. As a result, major search companies have invested significant resources into geographic search technologies, also often called local search. This paper studies geographic search queries, i.e., text queries such as "hotel new york" that employ geographical terms in an attempt to restrict results to a particular region or location. Our main motivation is to identify opportunities for improving geographical search and related technologies, and we perform an analysis of 36 million queries of the recently released AOL query trace. First, we identify typical properties of geographic search (geo) queries based on a manual examination of several thousand queries. Based on these observations, we build a classifier that separates the trace into geo and non-geo queries. We then investigate the properties of geo queries in more detail, and relate them to web sites and users associated with such queries. We also propose a new taxonomy for geographic search queries.

[1]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[2]  Kevin S. McCurley,et al.  Geospatial mapping and navigation of the web , 2001, WWW '01.

[3]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[4]  Shumeet Baluja,et al.  A large scale study of wireless search behavior: Google mobile search , 2006, CHI.

[5]  Wilfred Ng,et al.  Applying Co-training to Clickthrough Data for Search Engine Adaptation , 2004, DASFAA.

[6]  Yasuhiko Morimoto,et al.  Extracting spatial knowledge from the web , 2003, 2003 Symposium on Applications and the Internet, 2003. Proceedings..

[7]  Ray R. Larson,et al.  Geographic information retrieval and spatial browsing , 1996 .

[8]  Wei Vivian Zhang,et al.  Geomodification in Query Rewriting , 2006, GIR.

[9]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[10]  Bernard J. Jansen,et al.  A review of Web searching studies and a framework for future research , 2001, J. Assoc. Inf. Sci. Technol..

[11]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[12]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[13]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[14]  Amanda Spink,et al.  Searching the Web: the public and their queries , 2001 .

[15]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[16]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[17]  Ying Li,et al.  Detecting dominant locations from search queries , 2005, SIGIR '05.

[18]  ChengXiang Zhai,et al.  Mining long-term search history to improve search accuracy , 2006, KDD '06.

[19]  Bernhard Seeger,et al.  Design and Implementation of a Geographic Search Engine , 2005, WebDB.

[20]  Mário J. Silva,et al.  Indexing and ranking in Geo-IR systems , 2005, GIR '05.

[21]  Jaime Teevan,et al.  History repeats itself: repeat queries in Yahoo's logs , 2006, SIGIR '06.

[22]  Dirk Lewandowski,et al.  Query types and search topics of German Web search engine users , 2007, Inf. Serv. Use.

[23]  Jochen L. Leidner Toponym Resolution in Text: “Which Sheffield is it?” , 2004 .

[24]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[25]  Alberto H. F. Laender,et al.  Geographic web search based on positioning expressions , 2005, GIR '05.

[26]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[27]  Amanda Spink,et al.  An analysis of multimedia searching on AltaVista , 2003, MIR '03.

[28]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[29]  Dick Stenmark One Week with a Corporate Search Engine: A Time Based Analysis of Intranet Information Seeking , 2005, AMCIS.

[30]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[31]  Alia I. Abdelmoty,et al.  The SPIRIT Spatial Search Engine: Architecture, Ontologies and Spatial Indexing , 2004, GIScience.

[32]  Eugene Agichtein,et al.  Identifying "best bet" web search results by mining past user behavior , 2006, KDD '06.

[33]  Jochen L. Leidner Toponym resolution in text (abstract only): "which sheffield is it?" , 2004, SIGIR '04.

[34]  Luis Gravano,et al.  Exploiting Geographical Location Information of Web Pages , 1999, WebDB.

[35]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[36]  Zhenyu Liu,et al.  Automatic identification of user goals in Web search , 2005, WWW '05.

[37]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[38]  Luis Gravano,et al.  Categorizing web queries according to geographical locality , 2003, CIKM '03.

[39]  M. Sanderson,et al.  Analyzing geographic queries , 2004 .