Probabilistic Prototype Model for Serendipitous Property Mining

Besides providing the relevant information, amusing users has been an important role of the web. Many web sites provide serendipitous (unexpected but relevant) information to draw user traffic. In this paper, we study the representative scenario of mining an amusing quiz. An existing approach leverages a knowledge base to mine an unexpected property then find quiz questions on such property, based on prototype theory in cognitive science. However, existing deterministic model is vulnerable to noise in the knowledge base. Therefore, we instead propose to leverage probabilistic approach to build a prototype that can overcome noise. Our extensive empirical study shows that our approach not only significantly outperforms baselines by 0.06 in accuracy, and 0.11 in serendipity but also shows higher relevance than the traditional relevance-pursuing baseline using TF-IDF.

[1]  Geert-Jan Houben,et al.  Serendipitous Browsing: Stumbling through Wikipedia , 2012 .

[2]  Osmar R. Zaïane,et al.  A Nonparametric Outlier Detection for Effectively Discovering Top-N Outliers from Engineering Data , 2006, PAKDD.

[3]  Mouzhi Ge,et al.  Beyond accuracy: evaluating recommender systems by coverage and serendipity , 2010, RecSys '10.

[4]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[5]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[6]  Seung-won Hwang,et al.  Attribute extraction and scoring: A probabilistic approach , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[7]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[8]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[9]  Heather L. O'Brien,et al.  Exploring user engagement in online news interactions , 2011, ASIST.

[10]  Fabien L. Gandon,et al.  Discovery hub: on-the-fly linked data exploratory search , 2013, I-SEMANTICS '13.

[11]  Mounia Lalmas,et al.  Penguins in sweaters, or serendipitous entity search on user-generated content , 2013, CIKM.

[12]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[13]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[14]  Seung-won Hwang,et al.  Trivia quiz mining using probabilistic knowledge , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[15]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[16]  Raymond T. Ng,et al.  A unified approach for mining outliers , 1997, CASCON.

[17]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[18]  Toshio Uchiyama,et al.  Classical music for rock fans?: novel recommendations for expanding user interests , 2010, CIKM.

[19]  Guy Shani,et al.  Evaluating Recommendation Systems , 2011, Recommender Systems Handbook.

[20]  Matthew Merzbacher,et al.  Automatic Generation of Trivia Questions , 2002, ISMIS.

[21]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[22]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[23]  Christos Faloutsos,et al.  TANGENT: a novel, 'Surprise me', recommendation algorithm , 2009, KDD.