Efficient network aware search in collaborative tagging sites

The popularity of collaborative tagging sites presents a unique opportunity to explore keyword search in a context where query results are determined by the opinion of a network of taggers related to a seeker. In this paper, we present the first in-depth study of network-aware search. We investigate efficient top-k processing when the score of an answer is computed as its popularity among members of a seeker's network. We argue that obvious adaptations of top-k algorithms are too space-intensive, due to the dependence of scores on the seeker's network. We therefore develop algorithms based on maintaining score upper-bounds. The global upper-bound approach maintains a single score upper-bound for every pair of item and tag, over the entire collection of users. The resulting bounds are very coarse. We thus investigate clustering seekers based on similar behavior of their networks. We show that finding the optimal clustering of seekers is intractable, but we provide heuristic methods that give substantial time improvements. We then give an optimization that can benefit smaller populations of seekers based on clustering of taggers. Our results are supported by extensive experiments on del.icio.us datasets.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  Yanlei Diao,et al.  High-performance complex event processing over streams , 2006, SIGMOD Conference.

[3]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Cong Yu,et al.  Leveraging Tagging to Model User Interests in del.icio.us , 2008, AAAI Spring Symposium: Social Information Processing.

[5]  Hongyuan Zha,et al.  Exploring social annotations for information retrieval , 2008, WWW.

[6]  Ronald Fagin,et al.  Combining fuzzy information: an overview , 2002, SGMD.

[7]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[8]  Gerhard Weikum,et al.  KLEE: A Framework for Distributed Top-k Query Algorithms , 2005, VLDB.

[9]  Krishna P. Gummadi,et al.  Exploiting Social Networks for Internet Search , 2006, HotNets.

[10]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[11]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[12]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[13]  Margaret E. I. Kipp,et al.  Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices , 2007, ASIST.

[14]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[15]  Walid G. Aref,et al.  Joining Ranked Inputs in Practice , 2002, VLDB.

[16]  Sihem Amer-Yahia,et al.  Adaptive processing of top-k queries in XML , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Seung-won Hwang,et al.  Probe Minimization by Schedule Optimization: Supporting Top-K Queries with Expensive Predicates , 2007, IEEE Transactions on Knowledge and Data Engineering.

[18]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[19]  Xin Li,et al.  Tag-based social interest discovery , 2008, WWW.

[20]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.