Identifying the most influential data objects with reverse top-k queries

Top-k queries are widely applied for retrieving a ranked set of the k most interesting objects based on the individual user preferences. As an example, in online marketplaces, customers (users) typically seek a ranked set of products (objects) that satisfy their needs. Reversing top-k queries leads to a query type that instead returns the set of customers that find a product appealing (it belongs to the top-k result set of their preferences). In this paper, we address the challenging problem of processing queries that identify the top-m most influential products to customers, where influence is defined as the cardinality of the reverse top-k result set. This definition of influence is useful for market analysis, since it is directly related to the number of customers that value a particular product and, consequently, to its visibility and impact in the market. Existing techniques require processing a reverse top-k query for each object in the database, which is prohibitively expensive even for databases of moderate size. In contrast, we propose two algorithms, SB and BB, for identifying the most influential objects: SB restricts the candidate set of objects that need to be examined, while BB is a branch-and-bound algorithm that retrieves the result incrementally. Furthermore, we propose meaningful variations of the query for most influential objects that are supported by our algorithms. Our experiments demonstrate the efficiency of our algorithms both for synthetic and real-life datasets.

[1]  Heikki Mannila,et al.  Standing Out in a Crowd: Selecting Attributes for Maximum Visibility , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[2]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[3]  Anthony K. H. Tung,et al.  DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[4]  S. Muthukrishnan,et al.  Influence sets based on reverse nearest neighbor queries , 2000, SIGMOD '00.

[5]  Yufei Tao,et al.  Branch-and-bound processing of ranked queries , 2007, Inf. Syst..

[6]  Philip S. Yu,et al.  Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor , 2009, Proc. VLDB Endow..

[7]  Jiawei Han,et al.  Promotion Analysis in Multi-Dimensional Space , 2009, Proc. VLDB Endow..

[8]  Yang Du,et al.  On Computing Top-t Most Influential Spatial Sites , 2005, VLDB.

[9]  Raymond Chi-Wing Wong,et al.  Creating Competitive Products , 2009, Proc. VLDB Endow..

[10]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[11]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[12]  Christos Doulkeridis,et al.  Reverse top-k queries , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[13]  Bernhard Seeger,et al.  Efficient Computation of Reverse Skyline Queries , 2007, VLDB.

[14]  Yizhou Sun,et al.  Region-based online promotion analysis , 2010, EDBT '10.