Wallenius Bayes

This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.

[1]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[2]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[3]  L. Thurstone A law of comparative judgment. , 1994 .

[4]  Robert M. Sapolsky,et al.  The Trouble with Testosterone and Other Essays on the Biology of the Human Predicament , 1997 .

[5]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[6]  Foster J. Provost,et al.  Predictive Modeling With Big Data: Is Bigger Really Better? , 2013, Big Data.

[7]  J. Chesson A non-central multivariate hypergeometric distribution arising from biased sampling with application to selective predation , 1976 .

[8]  P. Burke Distinction: a social critique of the judgement of taste , 1989 .

[9]  Agner Fog,et al.  Calculation Methods for Wallenius' Noncentral Hypergeometric Distribution , 2008, Commun. Stat. Simul. Comput..

[10]  T. Perneger,et al.  A Self-Administered Questionnaire to Measure Dependence on Cigarettes: The Cigarette Dependence Scale , 2003, Neuropsychopharmacology.

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[13]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[14]  S. Nash,et al.  Numerical methods and software , 1990 .

[15]  Edmund Fantino,et al.  Recent Developments In Choice , 1974, Psychology of Learning and Motivation.

[16]  Peter A. Flach,et al.  Naive Bayesian Classification of Structured Data , 2004, Machine Learning.

[17]  R J HERRNSTEIN,et al.  Relative and absolute strength of response as a function of frequency of reinforcement. , 1961, Journal of the experimental analysis of behavior.

[18]  Kenneth T. Wallenius,et al.  BIASED SAMPLING; THE NONCENTRAL HYPERGEOMETRIC PROBABILITY DISTRIBUTION , 1963 .

[19]  Peter A. Flach,et al.  Decomposing Probability Distributions on Structured Individuals , 2000, ILP Work-in-progress reports.

[20]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .