Who uses web search for what: and how

We analyze a large query log of 2.3 million anonymous registered users from a web-scale U.S. search engine in order to jointly analyze their on-line behavior in terms of who they might be (demographics), what they search for (query topics), and how they search (session analysis). We examine basic demographics from registration information provided by the users, augmented with U.S. census data, analyze basic session statistics, classify queries into types (navigational, informational, transactional) based on click entropy, classify queries into topic categories, and cluster users based on the queries they issued. We then examine the resulting clusters in terms of demographics and search behavior. Our analysis of the data suggests that there are important differences in search behavior across different demographic groups in terms of the topics they search for, and how they search (e.g., white conservatives are those likely to have voted republican, mostly white males, who search for business, home, and gardening related topics; Baby Boomers tend to be primarily interested in Finance and a large fraction of their sessions consist of simple navigational queries related to online banking, etc.). Finally, we examine regional search differences, which seem to correlate with differences in local industries (e.g., gambling related queries are highest in Las Vegas and lowest in Salt Lake City; searches related to actors are about three times higher in L.A. than in any other region).

[1]  Jaime Teevan,et al.  Information re-retrieval: repeat queries in Yahoo's logs , 2007, SIGIR.

[2]  Ryen W. White,et al.  WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .

[3]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[4]  James Allan,et al.  Predicting searcher frustration , 2010, SIGIR.

[5]  Ricardo A. Baeza-Yates,et al.  The Intention Behind Web Queries , 2006, SPIRE.

[6]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ravi Kumar,et al.  A characterization of online browsing behavior , 2010, WWW '10.

[8]  Ingmar Weber,et al.  The demographics of web search , 2010, SIGIR.

[9]  Amanda Spink,et al.  Determining the informational, navigational, and transactional intent of Web queries , 2008, Inf. Process. Manag..

[10]  Ingmar Weber,et al.  Demographic information flows , 2010, CIKM '10.

[11]  Ellen M. Voorhees,et al.  The Philosophy of Information Retrieval Evaluation , 2001, CLEF.

[12]  Hua Li,et al.  Demographic prediction based on user's browsing behavior , 2007, WWW '07.

[13]  Benjamin Piwowarski,et al.  Mining user web search activity with layered bayesian networks or how to capture a click in its context , 2009, WSDM '09.

[14]  C. Lee Giles,et al.  Probabilistic user behavior models , 2003, Third IEEE International Conference on Data Mining.

[15]  Susan T. Dumais,et al.  Keeping and re-finding information on the web: What do people do and what do they need? , 2005, ASIST.

[16]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[17]  Eugene Agichtein,et al.  Ready to buy or just browsing?: detecting web searcher goals from interaction data , 2010, SIGIR.

[18]  Shui-Lung Chuang,et al.  Subject categorization of query terms for exploring Web users' search interests , 2002, J. Assoc. Inf. Sci. Technol..

[19]  Xiaolong Li,et al.  Inferring search behaviors using partially observable Markov (POM) model , 2010, WSDM '10.

[20]  David M. Mount,et al.  A local search approximation algorithm for k-means clustering , 2002, SCG '02.

[21]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[22]  Ryen W. White,et al.  Predicting query performance using query, result, and user interaction features , 2010, RIAO.