Querying Probabilistic Preferences in Databases

We propose a novel framework wherein probabilistic preferences can be naturally represented and analyzed in a probabilistic relational database. The framework augments the relational schema with a special type of a relation symbol---a preference symbol. A deterministic instance of this symbol holds a collection of binary relations. Abstractly, the probabilistic variant is a probability space over databases of the augmented form (i.e., probabilistic database). Effectively, each instance of a preference symbol can be represented as a collection of parametric preference distributions such as Mallows. We establish positive and negative complexity results for evaluating Conjunctive Queries (CQs) over databases where preferences are represented in the Repeated Insertion Model (RIM), Mallows being a special case. We show how CQ evaluation reduces to a novel inference problem (of independent interest) over RIM, and devise a solver with polynomial data complexity.

[1]  Gail Mcelroy,et al.  Candidate Gender and Voter Choice: Analysis from a Multimember Preferential Voting System , 2010 .

[2]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[3]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[4]  Venkatesh Saligrama,et al.  Learning Mixed Membership Mallows Models from Pairwise Comparisons , 2015, ArXiv.

[5]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[6]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[7]  Yi Mao,et al.  Non-parametric Modeling of Partially Ranked Data , 2007, NIPS.

[8]  Craig Boutilier,et al.  Effective sampling and learning for mallows models with pairwise-preference data , 2014, J. Mach. Learn. Res..

[9]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[10]  A. Pekec,et al.  The repeated insertion model for rankings: Missing link between two subset choice models , 2004 .

[11]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.

[12]  Sumit Chopra,et al.  Two of a kind or the ratings game? Adaptive pairwise preferences and latent factor models , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Peter Winkler,et al.  Counting linear extensions is #P-complete , 1991, STOC '91.

[14]  I. C. Gormley,et al.  A mixture of experts model for rank data with applications in election studies , 2008, 0901.4203.

[15]  Sreenivas Gollapudi,et al.  Ranking mechanisms in twitter-like forums , 2010, WSDM '10.

[16]  Shotaro Akaho,et al.  Supervised ordering by regression combined with Thurstone’s model , 2006, Artificial Intelligence Review.

[17]  Avrim Blum,et al.  Learning Mixtures of Ranking Models , 2014, NIPS.

[18]  Thomas Lukasiewicz,et al.  Probabilistic Models over Weighted Orderings: Fixed-Parameter Tractable Variable Elimination , 2016, KR.

[19]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[20]  Guy Van den Broeck,et al.  Tractable Learning for Structured Probability Spaces: A Case Study in Learning Preference Distributions , 2015, IJCAI.

[21]  Mitsunori Ogihara,et al.  Counting Classes are at Least as Hard as the Polynomial-Time Hierarchy , 1992, SIAM J. Comput..

[22]  Julia Stoyanovich,et al.  Analyzing Crowd Rankings , 2015, WebDB.

[23]  Joachim M. Buhmann,et al.  Cluster analysis of heterogeneous rank data , 2007, ICML '07.

[24]  Sven Laur,et al.  Robust rank aggregation for gene list integration and meta-analysis , 2012, Bioinform..

[25]  Thomas Brendan Murphy,et al.  A Latent Space Model for Rank Data , 2006, SNA@ICML.

[26]  John D. Lafferty,et al.  Cranking: Combining Rankings Using Conditional Probability Models on Permutations , 2002, ICML.

[27]  Julia Stoyanovich,et al.  A System for Management and Analysis of Preference Data , 2014, Proc. VLDB Endow..

[28]  Marina Meila,et al.  Recursive Inversion Models for Permutations , 2014, NIPS.

[29]  Ashish Kapoor,et al.  Riffled Independence for Efficient Inference with Partial Rankings , 2012, J. Artif. Intell. Res..

[30]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[31]  P. Diaconis A Generalization of Spectral Analysis with Application to Ranked Data , 1989 .

[32]  Georgia Koutrika,et al.  PrefDB: Supporting Preferences as First-Class Citizens in Relational Databases , 2014, IEEE Transactions on Knowledge and Data Engineering.

[33]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[34]  Julia Stoyanovich,et al.  Workload-driven learning of mallows mixtures with pairwise preference data , 2016, WebDB '16.

[35]  S. Smale,et al.  On a theory of computation and complexity over the real numbers; np-completeness , 1989 .

[36]  M. Fligner,et al.  Multistage Ranking Models , 1988 .