People searching for people: analysis of a people search engine log

Recent years show an increasing interest in vertical search: searching within a particular type of information. Understanding what people search for in these "verticals" gives direction to research and provides pointers for the search engines themselves. In this paper we analyze the search logs of one particular vertical: people search engines. Based on an extensive analysis of the logs of a search engine geared towards finding people, we propose a classification scheme for people search at three levels: (a) queries, (b) sessions, and (c) users. For queries, we identify three types, (i) event-based high-profile queries (people that become "popular" because of an event happening), (ii) regular high-profile queries (celebrities), and (iii) low-profile queries (other, less-known people). We present experiments on automatic classification of queries. On the session level, we observe five types: (i) family sessions (users looking for relatives), (ii) event sessions (querying the main players of an event), (iii) spotting sessions (trying to "spot" different celebrities online), (iv) polymerous sessions (sessions without a clear relation between queries), and (v) repetitive sessions (query refinement and copying). Finally, for users we identify four types: (i) monitors, (ii) spotters, (iii) followers, and (iv) polymers. Our findings not only offer insight into search behavior in people search engines, but they are also useful to identify future research directions and to provide pointers for search engine improvements.

[1]  Gilad Mishne,et al.  A Study of Blog Search , 2006, ECIR.

[2]  Ingmar Weber,et al.  Who uses web search for what: and how , 2011, WSDM '11.

[3]  Sofia Stamou,et al.  Interpreting User Inactivity on Search Results , 2010, ECIR.

[4]  Julio Gonzalo,et al.  WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks , 2010, CLEF.

[5]  Peter Mika,et al.  Ad-hoc object retrieval in the web of data , 2010, WWW '10.

[6]  Ryen W. White,et al.  WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .

[7]  C. Lee Giles,et al.  Probabilistic user behavior models , 2003, Third IEEE International Conference on Data Mining.

[8]  M. de Rijke,et al.  Learning Semantic Query Suggestions , 2009, SEMWEB.

[9]  Hao-Ren Ke,et al.  Exploring behavior of E-journal users in science and technology: Transaction log analysis of Elsevier's ScienceDirect OnSite in Taiwan , 2002 .

[10]  Maarten de Rijke,et al.  Search behavior of media professionals at an audiovisual archive: A transaction log analysis , 2010, J. Assoc. Inf. Sci. Technol..

[11]  Hang Li,et al.  Named entity recognition in query , 2009, SIGIR.

[12]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[13]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[14]  Amanda Spink,et al.  Defining a session on Web search engines , 2007, J. Assoc. Inf. Sci. Technol..

[15]  Efthimis N. Efthimiadis,et al.  Analyzing and evaluating query reformulation strategies in web search logs , 2009, CIKM.

[16]  Krisztian Balog,et al.  Overview of the TREC 2010 Entity Track , 2010, TREC.

[17]  Sofia Stamou,et al.  Queries without Clicks: Successful or Failed Searches? , 2009 .

[18]  Mary Madden,et al.  Reputation Management and Social Media: How People Monitor Their Identity and Search for Others Online , 2010 .

[19]  Sally Jo Cunningham,et al.  A transaction log analysis of a digital library , 2000, International Journal on Digital Libraries.

[20]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[21]  Gabriella Kazai,et al.  Overview of the INEX 2007 Book Search Track (BookSearch'07) , 2007, INEX.

[22]  Javier Artiles Picón,et al.  Web people search , 2009 .

[23]  C. Lee Giles,et al.  Indexing and retrieval of scientific literature , 1999, CIKM '99.

[24]  Paul Thomas,et al.  Overview of the TREC 2009 Entity Track , 2009, TREC.

[25]  James Allan,et al.  Meeting of the MINDS: an information retrieval research agenda , 2007, SIGF.

[26]  William E. Moen Accessing distributed cultural heritage information , 1998, CACM.

[27]  Amanda Spink,et al.  How are we searching the World Wide Web? A comparison of nine search engine transaction logs , 2006, Inf. Process. Manag..

[28]  M. de Rijke,et al.  Ranking related entities: components and analyses , 2010, CIKM.

[29]  Mike Thelwall,et al.  Handbook of Research on Web Log Analysis , 2009, J. Assoc. Inf. Sci. Technol..

[30]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[31]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[32]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[33]  Enhong Chen,et al.  Context-aware query classification , 2009, SIGIR.

[34]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[35]  Gabriella Kazai,et al.  Overview of the INEX 2007 Book Search track: BookSearch '07 , 2008, SIGF.

[36]  Amanda Spink,et al.  Defining a session on Web search engines: Research Articles , 2007 .