Predicting and explaining behavioral data with structured feature space decomposition

Modeling human behavioral data is challenging due to its scale, sparseness (few observations per individual), heterogeneity (differently behaving individuals), and class imbalance (few observations of the outcome of interest). An additional challenge is learning an interpretable model that not only accurately predicts outcomes, but also identifies important factors associated with a given behavior. To address these challenges, we describe a statistical approach to modeling behavioral data called the structured sum-of-squares decomposition (S3D). The algorithm, which is inspired by decision trees, selects important features that collectively explain the variation of the outcome, quantifies correlations between the features, and bins the subspace of important features into smaller, more homogeneous blocks that correspond to similarly-behaving subgroups within the population. This partitioned subspace allows us to predict and analyze the behavior of the outcome variable both statistically and visually, giving a medium to examine the effect of various features and to create explainable predictions. We apply S3D to learn models of online activity from large-scale data collected from diverse sites, such as Stack Exchange, Khan Academy, Twitter, Duolingo, and Digg. We show that S3D creates parsimonious models that can predict outcomes in the held-out data at levels comparable to state-of-the-art approaches, but in addition, produces interpretable models that provide insights into behaviors. This is important for informing strategies aimed at changing behavior, designing social systems, but also for explaining predictions, a critical step towards minimizing algorithmic bias.

[1]  Olvi L. Mangasarian,et al.  Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[2]  Greg Stoddard,et al.  Popularity and Quality in Social News Aggregators: A Study of Reddit and Hacker News , 2015, WWW.

[3]  Vanessa Murdock,et al.  Transparent Tree Ensembles , 2018, SIGIR.

[4]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[5]  Max A. Little,et al.  Suitability of Dysphonia Measurements for Telemonitoring of Parkinson's Disease , 2008, IEEE Transactions on Biomedical Engineering.

[6]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[7]  Jake M. Hofman,et al.  Prediction and explanation in social systems , 2017, Science.

[8]  Andy Ozment,et al.  How to Live in a Post-Meltdown and -Spectre World , 2018, ACM Queue.

[9]  Sholom M. Weiss,et al.  Rule-based Machine Learning Methods for Functional Prediction , 1995, J. Artif. Intell. Res..

[10]  Duncan J Watts,et al.  A simple model of global cascades on random networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Maxi San Miguel,et al.  Is the Voter Model a model for voters? , 2013, Physical review letters.

[12]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[13]  Kristina Lerman,et al.  The Simple Rules of Social Contagion , 2013, Scientific Reports.

[14]  Noah A. Smith,et al.  Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , 2016, ACL 2016.

[15]  Jure Leskovec,et al.  Human Decisions and Machine Predictions , 2017, The quarterly journal of economics.

[16]  G. Kalyanaram,et al.  Nudge: Improving Decisions about Health, Wealth, and Happiness , 2011 .

[17]  Masataka Harada,et al.  A flexible, interpretable framework for assessing sensitivity to unmeasured confounding , 2016, Statistics in medicine.

[18]  Lada A. Adamic,et al.  Computational Social Science , 2009, Science.

[19]  Mohammad Hossein Rafiei,et al.  A Novel Machine Learning Model for Estimation of Sale Prices of Real Estate Units , 2016 .

[20]  Chih-Jen Lin,et al.  Training and Testing Low-degree Polynomial Data Mappings via Linear SVM , 2010, J. Mach. Learn. Res..

[21]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[22]  Ross D. King,et al.  COMPARISON OF ARTIFICIAL INTELLIGENCE METHODS FOR MODELING PHARMACEUTICAL QSARS , 1995 .

[23]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[24]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[25]  Kristina Lerman,et al.  How Visibility and Divided Attention Constrain Social Contagion , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[26]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[27]  Jesús Cerquides,et al.  Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT Images , 2010, ICDM.

[28]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[29]  R. Thaler,et al.  Nudge: Improving Decisions About Health, Wealth, and Happiness , 2008 .

[30]  Luis M. Candanedo,et al.  Data driven prediction models of energy use of appliances in a low-energy house , 2017 .

[31]  Kristina Lerman,et al.  What Stops Social Epidemics? , 2011, ICWSM.

[32]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[33]  Greg Stoddard,et al.  Popularity Dynamics and Intrinsic Quality in Reddit and Hacker News , 2015, ICWSM.

[34]  Krishna P. Gummadi,et al.  Quantifying Information Overload in Social Media and Its Impact on Social Contagions , 2014, ICWSM.

[35]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[36]  Spain,et al.  Cascade Dynamics of Complex Propagation , 2005, physics/0504165.

[37]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[38]  Burr Settles,et al.  A Trainable Spaced Repetition Model for Language Learning , 2016, ACL.

[39]  James P. Gleeson,et al.  Mathematical modeling of complex contagion on clustered networks , 2015, Front. Phys..

[40]  Alessandro Vespignani,et al.  The role of the airline transportation network in the prediction and predictability of global epidemics , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[41]  A. Barabasi,et al.  Quantifying the evolution of individual scientific impact , 2016, Science.

[42]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[43]  Lukasz A. Kurgan,et al.  Knowledge discovery approach to automated cardiac SPECT diagnosis , 2001, Artif. Intell. Medicine.

[44]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45]  Kristina Lerman,et al.  Can you Trust the Trend?: Discovering Simpson's Paradoxes in Social Data , 2018, WSDM.

[46]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[47]  C. Blyth On Simpson's Paradox and the Sure-Thing Principle , 1972 .

[48]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[49]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007, Biomedical engineering online.