Diversified ranking on large graphs: an optimization viewpoint

Diversified ranking on graphs is a fundamental mining task and has a variety of high-impact applications. There are two important open questions here. The first challenge is the measure - how to quantify the goodness of a given top-k ranking list that captures both the relevance and the diversity? The second challenge lies in the algorithmic aspect - how to find an optimal, or near-optimal, top-k ranking list that maximizes the measure we defined in a scalable way? In this paper, we address these challenges from an optimization point of view. Firstly, we propose a goodness measure for a given top-k ranking list. The proposed goodness measure intuitively captures both (a) the relevance between each individual node in the ranking list and the query; and (b) the diversity among different nodes in the ranking list. Moreover, we propose a scalable algorithm (linear wrt the size of the graph) that generates a provably near-optimal solution. The experimental evaluations on real graphs demonstrate its effectiveness and efficiency.

[1]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[2]  Yehuda Koren,et al.  Measuring and extracting proximity in networks , 2006, KDD '06.

[3]  Andrei Z. Broder,et al.  Estimating rates of rare events at multiple resolutions , 2007, KDD '07.

[4]  Tanya Y. Berger-Wolf,et al.  Sampling community structure , 2010, WWW '10.

[5]  Purnamrita Sarkar,et al.  Fast nearest-neighbor search in disk-resident graphs , 2010, KDD.

[6]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[8]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Surajit Chaudhuri,et al.  Ranking objects based on relationships and fixed associations , 2009, EDBT '09.

[10]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[11]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[12]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[13]  Lynn Wu,et al.  Social Network Effects on Performance and Layoffs: Evidence from the Adoption of a Social Networking Tool , 2011, ICIS.

[14]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[15]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[16]  Srinivasan Parthasarathy,et al.  Scalable graph clustering using stochastic flows: applications to community discovery , 2009, KDD.

[17]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[18]  Heikki Mannila,et al.  Relational link-based ranking , 2004, VLDB.

[19]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[20]  Jimeng Sun,et al.  Social action tracking via noise tolerant time-varying factor graphs , 2010, KDD.

[21]  G. Karypis,et al.  Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[22]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[23]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[24]  R. Pemantle Vertex-reinforced random walk , 1992, math/0404041.

[25]  Chao Liu,et al.  BBM: bayesian browsing model from petabyte-scale data , 2009, KDD.

[26]  Philip S. Yu,et al.  Cross-relational clustering with user's guidance , 2005, KDD '05.

[27]  Hongyan Liu,et al.  Fast Single-Pair SimRank Computation , 2010, SDM.

[28]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[29]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[30]  David Maxwell Chickering,et al.  Dependency Networks for Collaborative Filtering and Data Visualization , 2000, UAI.

[31]  Jian Pei,et al.  Neighbor query friendly compression of social networks , 2010, KDD.

[32]  Tina Eliassi-Rad,et al.  Evaluating Statistical Tests for Within-Network Classifiers of Relational Data , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[33]  G. Stewart,et al.  Matrix Perturbation Theory , 1990 .

[34]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[35]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[36]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[37]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[38]  Nitesh V. Chawla,et al.  New perspectives and methods in link prediction , 2010, KDD.

[39]  Arindam Banerjee,et al.  Generalized Probabilistic Matrix Factorizations for Collaborative Filtering , 2010, 2010 IEEE International Conference on Data Mining.

[40]  Hui Xiong,et al.  An energy-efficient mobile recommender system , 2010, KDD.

[41]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.