Wallenius Naive Bayes

Traditional event models underlying naive Bayes classifiers assume probability distributions that are not appropriate for binary data generated by human behaviour. In this work, we develop a new event model, based on a somewhat forgotten distribution created by Kenneth Ted Wallenius in 1963. We show that it achieves superior performance using less data on a collection of Facebook datasets, where the task is to predict personality traits, based on likes.

[1]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[2]  T. Perneger,et al.  A Self-Administered Questionnaire to Measure Dependence on Cigarettes: The Cigarette Dependence Scale , 2003, Neuropsychopharmacology.

[3]  S. Nash,et al.  Numerical methods and software , 1990 .

[4]  Agner Fog,et al.  Calculation Methods for Wallenius' Noncentral Hypergeometric Distribution , 2008, Commun. Stat. Simul. Comput..

[5]  Peter A. Flach,et al.  Decomposing Probability Distributions on Structured Individuals , 2000, ILP Work-in-progress reports.

[6]  Peter A. Flach,et al.  Naive Bayesian Classification of Structured Data , 2004, Machine Learning.

[7]  R. Larsen,et al.  The Satisfaction with Life Scale , 1985, Journal of personality assessment.

[8]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[9]  J. Chesson A non-central multivariate hypergeometric distribution arising from biased sampling with application to selective predation , 1976, Journal of Applied Probability.

[10]  P. Bourdieu Distinction: A Social Critique of the Judgement of Taste* , 2018, Food and Culture.

[11]  I. Kant,et al.  The Critique of Judgement , 2020 .

[12]  David Martens,et al.  Data mining for fraud detection using invoicing data : a case study in fiscal residence fraud , 2013 .

[13]  Foster J. Provost,et al.  Predictive Modeling With Big Data: Is Bigger Really Better? , 2013, Big Data.

[14]  Bart Baesens,et al.  New insights into churn prediction in the telecommunication sector: A profit driven data mining approach , 2012, Eur. J. Oper. Res..

[15]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.