Analyzing Behavioral Big Data: Methodological, practical, ethical, and moral issues

The term “Big Data” evokes emotions ranging from excitement to exasperation in the statistics community. Looking beyond these emotions reveals several important changes that affect us as statisticians and as humans. I focus on Behavioral Big Data (BBD), or very large and rich multidimensional datasets on human behaviors, actions and interactions, which have become available to companies, governments, and researchers. The paper describes the BBD landscape and examines opportunities and critical issues that arise when applying statistical and data mining approaches to Behavioral Big Data, including the move from macro- to micro-decisioning and its implications.

[1]  Foster J. Provost,et al.  Predictive Modeling With Big Data: Is Bigger Really Better? , 2013, Big Data.

[2]  Kenneth C. Lichtendahl,et al.  Data Mining for Business Analytics: Concepts, Techniques, and Applications with XLMiner , 2016 .

[3]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[4]  Galit Shmueli,et al.  Predictive Analytics in Information Systems Research , 2010, MIS Q..

[5]  Deepak Agarwal,et al.  Statistical Methods for Recommender Systems , 2016 .

[6]  Galit Shmueli,et al.  On information quality , 2012, SSRN Electronic Journal.

[7]  J. Heckman Sample selection bias as a specification error , 1979 .

[8]  Thomas A. Louis,et al.  Perils and potentials of self‐selected entry to epidemiological studies and surveys , 2016 .

[9]  Galit Shmueli,et al.  One-Way Mirrors in Online Dating: A Randomized Field Experiment , 2016, Manag. Sci..

[10]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[11]  Chris Volinsky,et al.  Network-Based Marketing: Identifying Likely Adopters Via Consumer Networks , 2006, math/0606278.

[12]  Richard D. De Veaux,et al.  Applying statistical thinking to ‘Big Data’ problems , 2014 .

[13]  Sean J. Taylor,et al.  Social Influence Bias: A Randomized Experiment , 2013, Science.

[15]  Izak Benbasat,et al.  Designing Warning Messages for Detecting Biased Online Product Recommendations: An Empirical Investigation , 2015, Inf. Syst. Res..

[16]  S. Fienberg Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation , 2006, math/0609288.

[17]  Timothy W. Armistead Resurrecting the Third Variable: A Critique of Pearl's Causal Analysis of Simpson's Paradox , 2014 .

[18]  Galit Shmueli,et al.  Getting Started with Business Analytics: Insightful Decision-Making , 2013 .

[19]  David G. Schwartz,et al.  News censorship in online social networks: A study of circumvention in the commentsphere , 2017, J. Assoc. Inf. Sci. Technol..

[20]  Gordon B. Davis,et al.  Academic Data Collection in Electronic Environments: Defining Acceptable Use of Internet Resources , 2006, MIS Q..

[21]  Henrik Toft Sørensen,et al.  Comment on "Perils and potentials of self-selected entry to epidemiological studies and surveys" , 2016 .

[22]  Galit Shmueli,et al.  A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Big Data , 2015, MIS Q..

[23]  Ravi Bapna,et al.  Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks - Online E-Companion Appendix , 2014, Manag. Sci..

[24]  Diane Lambert,et al.  More bang for their bucks: assessing new features for online advertisers , 2007, ADKDD '07.

[25]  Galit Shmueli,et al.  Research Commentary - Too Big to Fail: Large Samples and the p-Value Problem , 2013, Inf. Syst. Res..

[26]  Tom Fawcett Mining the Quantified Self: Personal Knowledge Discovery as a Challenge for Data Science , 2015, Big Data.

[27]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[28]  Galit Shmueli,et al.  Information Quality: The Potential of Data and Analytics to Generate Knowledge , 2016 .

[29]  Vasant Dhar,et al.  Prediction in Economic Networks , 2014, Inf. Syst. Res..

[30]  S. Sumathi,et al.  Statistical Themes and Lessons for Data Mining , 2006 .

[31]  Galit Shmueli,et al.  The Forest or the Trees? Tackling Simpson's Paradox with Classi fication and Regression Trees , 2014 .

[32]  David G. Schwartz,et al.  Revealing censored information through comments and commenters in online social networks , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[33]  Eytan Adar The Two Cultures and Big Data Research , 2015 .

[34]  Rahul Telang,et al.  Broadband in School: Impact on Student Performance , 2014, Manag. Sci..

[35]  Ritu Agarwal,et al.  Vocal Minority and Silent Majority: How Do Online Ratings Reflect Population Perceptions of Quality? , 2015, MIS Q..

[36]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[37]  David M. Steinberg,et al.  Industrial statistics: The challenges and the research , 2016 .

[38]  Oliver Hinz,et al.  Research Note - Can't Buy Me Love...Or Can I? Social Capital Attainment Through Conspicuous Consumption in Virtual Environments , 2015, Inf. Syst. Res..

[39]  Ron S. Kenett Statistics: A Life Cycle View , 2015 .

[40]  M. Moussa,et al.  Monitoring Employee Behavior Through the Use of Technology and Issues of Employee Privacy in America , 2015 .

[41]  Yehuda Koren,et al.  All Together Now: A Perspective on the Netflix Prize , 2010 .