Uncovering the riffled independence structure of ranked data

Representing distributions over permutations can be a daunting task due to the fact that the number of permutations of n objects scales factorially in n. One recent way that has been used to reduce storage complexity has been to exploit probabilistic independence, but as we argue, full independence assumptions impose strong sparsity constraints on distributions and are unsuitable for modeling rankings. We identify a novel class of independence structures, called riffled independence, encompassing a more expressive family of distributions while retaining many of the properties necessary for performing efficient inference and reducing sample complexity. In riffled independence, one draws two permutations independently, then performs the riffle shuffle, common in card games, to combine the two permutations to form a single permutation. Within the context of ranking, riffled independence corresponds to ranking disjoint sets of objects independently, then interleaving those rankings. In this paper, we provide a formal introduction to riffled independence and propose an automated method for discovering sets of items which are riffle independent from a training set of rankings. We show that our clustering-like algorithms can be used to discover meaningful latent coalitions from real preference ranking datasets and to learn the structure of hierarchically decomposable models based on riffled independence. AMS 2000 subject classifications: Primary 68T37, 60C05; secondary 60B15.

[1]  L. Thurstone,et al.  A low of comparative judgement , 1927 .

[2]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[3]  R. Plackett The Analysis of Permutations , 1975 .

[4]  D. Reid An algorithm for tracking multiple targets , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[5]  M. Fligner,et al.  Distance Based Ranking Models , 1986 .

[6]  M. Fligner,et al.  Multistage Ranking Models , 1988 .

[7]  P. Diaconis Group representations in probability and statistics , 1988 .

[8]  P. Diaconis A Generalization of Spectral Analysis with Application to Ranked Data , 1989 .

[9]  P. Diaconis,et al.  Trailing the Dovetail Shuffle to its Lair , 1992 .

[10]  Giorgio Gallo,et al.  Directed Hypergraphs and Applications , 1993, Discret. Appl. Math..

[11]  M. Clausen,et al.  Fast Fourier transforms for symmetric groups : theory and implementation , 1993 .

[12]  Klaus-Uwe Höffgen,et al.  Learning and robust learning of product distributions , 1993, COLT '93.

[13]  Lenny Pitt,et al.  Proceedings of the sixth annual conference on Computational learning theory , 1993, COLT 1993.

[14]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[15]  J. Marden Analyzing and Modeling Rank Data , 1996 .

[16]  David Keith Maslen,et al.  The efficient computation of Fourier transforms on the symmetric group , 1998, Math. Comput..

[17]  A. Terras Fourier Analysis on Finite Groups and Applications: Index , 1999 .

[18]  Daniel N. Rockmore,et al.  The FFT: an algorithm the whole family can use , 2000, Comput. Sci. Eng..

[19]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[20]  Toshihiro Kamishima,et al.  Nantonac collaborative filtering: recommendation based on order responses , 2003, KDD '03.

[21]  Martin Vetterli,et al.  Proceedings of the 4th international symposium on Information processing in sensor networks , 2005 .

[22]  Leonidas J. Guibas,et al.  Lazy inference on object identities in wireless sensor networks , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[23]  Thomas Brendan Murphy,et al.  A Latent Space Model for Rank Data , 2006, SNA@ICML.

[24]  Carlos Guestrin,et al.  Efficient Principled Learning of Thin Junction Trees , 2007, NIPS.

[25]  Jeff A. Bilmes,et al.  Consensus ranking under the exponential model , 2007, UAI.

[26]  Devavrat Shah,et al.  Inferring rankings under constrained sensing , 2008, NIPS.

[27]  I. Kondor,et al.  Group theoretical methods in machine learning , 2008 .

[28]  John Guiver,et al.  Bayesian inference for Plackett-Luce ranking models , 2009, ICML '09.

[29]  Leonidas J. Guibas,et al.  Fourier Theoretic Probabilistic Inference over Permutations , 2009, J. Mach. Learn. Res..

[30]  Fred Popowich,et al.  Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2009 .

[31]  Carlos Guestrin,et al.  Riffled Independence for Ranked Data , 2009, NIPS.

[32]  Devavrat Shah,et al.  A Data-Driven Approach to Modeling Choice , 2009, NIPS.

[33]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[34]  David R. Karger,et al.  Global Models of Document Structure using Latent Permutations , 2009, NAACL.

[35]  Dafna Shahaf,et al.  Learning Thin Junction Trees via Graph Cuts , 2009, AISTATS.

[36]  Leonidas J. Guibas,et al.  Exploiting Probabilistic Independence for Permutations , 2009, AISTATS.

[37]  Carlos Guestrin,et al.  Learning Hierarchical Riffle Independent Groupings from Rankings , 2010, ICML.

[38]  Mingxuan Sun,et al.  Visualizing differences in web search algorithms using the expected weighted hoeffding distance , 2010, WWW '10.

[39]  综合社会科学 The London Mathematical Society , 2012, From Servant to Queen: A Journey through Victorian Mathematics.