Scalable Diversified Ranking on Large Graphs

Enhancing diversity in ranking on graphs has been identified as an important retrieval and mining task. Nevertheless, many existing diversified ranking algorithms either cannot be scalable to large graphs due to the time or memory requirements, or lack an intuitive and reasonable diversified ranking measure. In this paper, we propose a new diversified ranking measure on large graphs, which captures both relevance and diversity, and formulate the diversified ranking problem as a submodular set function maximization problem. Based on the submodularity of the proposed measure, we develop an efficient greedy algorithm with linear time and space complexity w.r.t. the size of the graph to achieve near-optimal diversified ranking. In addition, we present a generalized diversified ranking measure and give a near-optimal randomized greedy algorithm with linear time and space complexity for optimizing it. We evaluate the proposed methods through extensive experiments on five real data sets. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithms.

[1]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[2]  Xueqi Cheng,et al.  A unified framework for recommending diverse and relevant queries , 2011, WWW.

[3]  Wolfgang Nejdl,et al.  Incremental diversification for very large sets: a streaming-based approach , 2011, SIGIR '11.

[4]  Chiranjib Bhattacharyya,et al.  Diversity in ranking via resistive graph centers , 2011, KDD.

[5]  Jingrui He,et al.  Diversified ranking on large graphs: an optimization viewpoint , 2011, KDD.

[6]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[7]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[8]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[9]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[10]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[11]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[12]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[13]  Ravi Kumar,et al.  Hierarchical topic segmentation of websites , 2006, KDD '06.

[14]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[15]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[16]  Dimitrios Gunopulos,et al.  Finding effectors in social networks , 2010, KDD.

[17]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[18]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[19]  Andreas Krause,et al.  Near-optimal Observation Selection using Submodular Functions , 2007, AAAI.

[20]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[21]  Jeffrey Xu Yu,et al.  Scalable Diversified Ranking on Large Graphs , 2011, IEEE Transactions on Knowledge and Data Engineering.

[22]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[23]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[24]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[25]  Philippe Flajolet,et al.  Loglog Counting of Large Cardinalities (Extended Abstract) , 2003, ESA.

[26]  Michael R. Lyu,et al.  Diversifying Query Suggestion Results , 2010, AAAI.

[27]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[28]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[29]  Tanya Y. Berger-Wolf,et al.  Expansion and search in networks , 2010, CIKM '10.

[30]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[31]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[32]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[33]  N. Linial,et al.  Expander Graphs and their Applications , 2006 .

[34]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.