Penguins in sweaters, or serendipitous entity search on user-generated content

In many cases, when browsing the Web users are searching for specific information or answers to concrete questions. Sometimes, though, users find unexpected, yet interesting and useful results, and are encouraged to explore further. What makes a result serendipitous? We propose to answer this question by exploring the potential of entities extracted from two sources of user-generated content -- Wikipedia, a user-curated online encyclopedia, and Yahoo! Answers, a more unconstrained question/answering forum -- in promoting serendipitous search. In this work, the content of each data source is represented as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category. We devise an algorithm based on lazy random walk with restart to retrieve entity recommendations from the networks. We show that our method provides novel results from both datasets, compared to standard web search engines. However, unlike previous research, we find that choosing highly emotional entities does not increase user interest for many categories of entities, suggesting a more complex relationship between topic matter and the desirable metadata attributes in serendipitous search.

[1]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[2]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[3]  Pasquale Lops,et al.  Introducing Serendipity in a Content-Based Recommender System , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[4]  Berkant Barla Cambazoglu,et al.  A large-scale sentiment analysis for Yahoo! answers , 2012, WSDM '12.

[5]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[6]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[7]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[8]  Fabrizio Silvestri,et al.  Efficient query recommendations in the long tail via center-piece subgraphs , 2012, SIGIR '12.

[9]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[10]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[11]  Fernando Diaz,et al.  A Methodology for Evaluating Aggregated Search Results , 2011, ECIR.

[12]  Fredrik Olsson,et al.  Usefulness of Sentiment Analysis , 2012, ECIR.

[13]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[14]  David Carmel,et al.  Towards expressive exploratory search over entity-relationship data , 2012, WWW.

[15]  Krisztian Balog,et al.  Entity search: building bridges between two worlds , 2010, SEMSEARCH '10.

[16]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[17]  Deepa Paranjpe,et al.  Learning document aboutness from implicit user feedback and document structure , 2009, CIKM.

[18]  Eugene Agichtein,et al.  On the evolution of the yahoo! answers QA community , 2008, SIGIR '08.

[19]  Mouzhi Ge,et al.  Beyond accuracy: evaluating recommender systems by coverage and serendipity , 2010, RecSys '10.

[20]  Dafna Shahaf,et al.  Trains of thought: generating information maps , 2012, WWW.

[21]  Hanna Knäusl Searching Wikipedia: learning the why, the how, and the role played by emotion , 2012 .

[22]  Daniele Quercia,et al.  Auralist: introducing serendipity into music recommendation , 2012, WSDM '12.

[23]  Geert-Jan Houben,et al.  Serendipitous Browsing: Stumbling through Wikipedia , 2012 .

[24]  Jiawei Han,et al.  Ranking objects based on relationships , 2006, SIGMOD Conference.

[25]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[26]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[27]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[28]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[29]  Lan Nie,et al.  Resolving Surface Forms to Wikipedia Topics , 2010, COLING.

[30]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[31]  Ronald Fagin,et al.  Comparing and aggregating rankings with ties , 2004, PODS '04.

[32]  Francesco Bonchi,et al.  From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph , 2013, WSDM '13.

[33]  Alessandro Bozzon,et al.  Liquid query: multi-domain exploratory search on the web , 2010, WWW '10.

[34]  Aya Soffer,et al.  Social search and discovery using a unified approach , 2009, HT '09.

[35]  Fabrizio Sebastiani Text Categorization , 2005, Encyclopedia of Database Technologies and Applications.

[36]  Roi Blanco,et al.  Influence of Timeline and Named-Entity Components on User Engagement , 2013, ECIR.

[37]  Heather L. O'Brien,et al.  Exploring user engagement in online news interactions , 2011, ASIST.

[38]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[39]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[40]  Ranieri Baraglia,et al.  Document Similarity Self-Join with MapReduce , 2010, 2010 IEEE International Conference on Data Mining.

[41]  Susan T. Dumais,et al.  From x-rays to silly putty via Uranus: serendipity and its role in web search , 2009, CHI.