On the Complexity of Query Result Diversification

Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D, a query Q, and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible to each other. Subsets of Q(D) are ranked by an objective function defined in terms of relevance and diversity. Query result diversification has found a variety of applications in databases, information retrieval, and operations research. This article investigates the complexity of result diversification for relational queries. (1) We identify three problems in connection with query result diversification, to determine whether there exists a set of k tuples that is ranked above a bound with respect to relevance and diversity, to assess the rank of a given k-element set, and to count how many k-element sets are ranked above a given bound based on an objective function. (2) We study these problems for a variety of query languages and for the three objective functions proposed in Gollapudi and Sharma [2009]. We establish the upper and lower bounds of these problems, all matching, for both combined complexity and data complexity. (3) We also investigate several special settings of these problems, identifying tractable cases. Moreover, (4) we reinvestigate these problems in the presence of compatibility constraints commonly found in practice, and provide their complexity in all these settings.

[1]  Georgia Koutrika,et al.  A survey on representation, composition and application of preferences in database systems , 2011, TODS.

[2]  Wolfgang Nejdl,et al.  Current Approaches to Search Result Diversification , 2009, LivingWeb@ISWC.

[3]  Jon Whittle,et al.  CARD: a decision-guidance framework and application for recommending composite alternatives , 2008, RecSys '08.

[4]  Oleg A. Prokopyev,et al.  The equitable dispersion problem , 2009, Eur. J. Oper. Res..

[5]  Surajit Chaudhuri Generalization and a framework for query modification , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[6]  Yi Chen,et al.  Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[7]  Fabrizio Silvestri,et al.  Efficient Diversification of Web Search Results , 2011, Proc. VLDB Endow..

[8]  Larry J. Stockmeyer,et al.  The Polynomial-Time Hierarchy , 1976, Theor. Comput. Sci..

[9]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[10]  Ricardo A. Baeza-Yates,et al.  New Stochastic Algorithms for Scheduling Ads in Sponsored Search , 2007, 2007 Latin American Web Conference (LA-WEB 2007).

[11]  Davide Martinenghi,et al.  Top-k bounded diversification , 2012, SIGMOD Conference.

[12]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[13]  Salil P. Vadhan,et al.  Computational Complexity , 2005, Encyclopedia of Cryptography and Security.

[14]  Yehoshua Sagiv,et al.  An incremental algorithm for computing ranked full disjunctions , 2005, PODS '05.

[15]  Aditya G. Parameswaran,et al.  Evaluating, combining and generalizing recommendations with prerequisites , 2010, CIKM.

[16]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[17]  Gediminas Adomavicius,et al.  Multidimensional Recommender Systems: A Data Warehousing Approach , 2001, WELCOM.

[18]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[19]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Mi Zhang,et al.  Avoiding monotony: improving the diversity of recommendation lists , 2008, RecSys '08.

[21]  Juliana Freire,et al.  Supporting Exploratory Queries in Databases , 2004, DASFAA.

[22]  Evaggelia Pitoura,et al.  PerK: personalized keyword search in relational databases through preferences , 2010, EDBT '10.

[23]  Theodoros Lappas,et al.  Finding a team of experts in social networks , 2009, KDD.

[24]  Georgia Koutrika,et al.  FlexRecs: expressing and combining flexible recommendations , 2009, SIGMOD Conference.

[25]  Wenfei Fan,et al.  On the complexity of package recommendation problems , 2012, PODS '12.

[26]  Kevin Chen-Chuan Chang,et al.  RankSQL: Supporting Ranking Queries in Relational Database Management Systems , 2005, VLDB.

[27]  Neoklis Polyzotis,et al.  Evaluating rank joins with optimal cost , 2008, PODS.

[28]  Laks V. S. Lakshmanan,et al.  Breaking out of the box of recommendations: from items to packages , 2010, RecSys '10.

[29]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[30]  Richard E. Ladner Polynomial Space Counting Problems , 1989, SIAM J. Comput..

[31]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[32]  Laks V. S. Lakshmanan,et al.  Composite recommendations: from items to packages , 2012, Frontiers of Computer Science.

[33]  Michael Wooldridge,et al.  On the computational complexity of qualitative coalitional games , 2004, Artif. Intell..

[34]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[35]  Jignesh M. Patel,et al.  Efficient and generic evaluation of ranked queries , 2011, SIGMOD '11.

[36]  Jorge Lobo,et al.  Qualifying Answers According to User Needs and Preferences , 1997, Fundam. Informaticae.

[37]  Evaggelia Pitoura,et al.  Diversity over Continuous Data , 2009, IEEE Data Eng. Bull..

[38]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[39]  Leslie G. Valiant,et al.  The Complexity of Computing the Permanent , 1979, Theor. Comput. Sci..

[40]  Andrei Voronkov,et al.  Complexity of nonrecursive logic programs with complex values , 1998, PODS.

[41]  Yehoshua Sagiv,et al.  Optimizing and parallelizing ranked enumeration , 2011, Proc. VLDB Endow..

[42]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[43]  Surajit Chaudhuri,et al.  Ranking objects based on relationships and fixed associations , 2009, EDBT '09.

[44]  Mark W. Krentel Generalizations of Opt P to the Polynomial Hierarchy , 1992, Theor. Comput. Sci..

[45]  Tao Li,et al.  Addressing diverse user preferences in SQL-query-result navigation , 2007, SIGMOD '07.

[46]  Surajit Chaudhuri,et al.  On the equivalence of recursive and nonrecursive datalog programs , 1992, J. Comput. Syst. Sci..

[47]  Cong Yu,et al.  Group Recommendation: Semantics and Efficiency , 2009, Proc. VLDB Endow..

[48]  Wenfei Fan,et al.  On the Complexity of Query Result Diversification , 2014, ACM Trans. Database Syst..

[49]  John R. Smith,et al.  Supporting Incremental Join Queries on Ranked Inputs , 2001, VLDB.

[50]  Anthony K. H. Tung,et al.  Relaxing join and selection queries , 2006, VLDB.

[51]  Peter Fankhauser,et al.  DivQ: diversification for keyword search over structured databases , 2010, SIGIR.

[52]  Phokion G. Kolaitis,et al.  Subtractive reductions and complete problems for counting complexity classes , 2000 .

[53]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[54]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[55]  Sihem Amer-Yahia,et al.  Recommendation Projects at Yahoo! , 2011, IEEE Data Eng. Bull..

[56]  Aditya G. Parameswaran,et al.  Recommendation systems with complex constraints: A course recommendation perspective , 2011, TOIS.

[57]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[58]  Heribert Vollmer,et al.  The satanic notations , 1995, SIGACT News.

[59]  Recommendation Diversification Using Explanations , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[60]  Sihem Amer-Yahia,et al.  Complexity and algorithms for composite retrieval , 2013, WWW '13 Companion.

[61]  Gena Hahn,et al.  Counting feasible solutions of the traveling salesman problem with pickups and deliveries is #P-complete , 2009, Discret. Appl. Math..

[62]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.