Big Data and Data Science Methods for Management Research

The recent advent of remote sensing, mobile technologies, novel transaction systems, and highperformance computing offers opportunities to understand trends, behaviors, and actions in a manner that has not been previously possible. Researchers can thus leverage “big data” that are generated from a plurality of sources including mobile transactions, wearable technologies, social media, ambient networks, andbusiness transactions.An earlierAcademy of Management Journal (AMJ) editorial explored the potential implications for data science inmanagement research and highlighted questions for management scholarship as well as the attendant challenges of data sharing and privacy (George, Haas, & Pentland, 2014). This nascent field is evolving rapidly and at a speed that leaves scholars and practitioners alike attempting to make sense of the emergent opportunities that big datahold.With thepromiseof bigdata comequestions about the analytical value and thus relevance of these data for theory development—including concerns over the context-specific relevance, its reliability and its validity. To address this challenge, data science is emerging as an interdisciplinary field that combines statistics, data mining, machine learning, and analytics to understand and explainhowwecan generate analytical insights and prediction models from structured and unstructured big data. Data science emphasizes the systematic study of the organization, properties, and analysis of data and their role in inference, including our confidence in the inference (Dhar, 2013).Whereas both big data and data science terms are often used interchangeably, “big data” refer to large and varied data that can be collected and managed, whereas “data science” develops models that capture, visualize, andanalyze theunderlyingpatterns in thedata. In this editorial, we address both the collection and handling of big data and the analytical tools provided by data science for management scholars. At the current time, practitioners suggest that data science applications tackle the three core elements of big data: volume, velocity, and variety (McAfee & Brynjolfsson, 2012; Zikopoulos & Eaton, 2011). “Volume” represents the sheer size of the dataset due to the aggregation of a large number of variables and an even larger set of observations for each variable. “Velocity” reflects the speed atwhich these data are collected and analyzed, whether in real time or near real time from sensors, sales transactions, social media posts, and sentiment data for breaking news and social trends. “Variety” in big data comes from the plurality of structured and unstructured data sources such as text, videos, networks, and graphics among others. The combinations of volume, velocity, and variety reveal the complex task of generating knowledge from big data, which often runs into millions of observations, and deriving theoretical contributions from such data. In this editorial, we provide a primer or a “starter kit” for potential data science applications inmanagement research. We do so with a caveat that emerging fields outdate and improve uponmethodologies while often supplanting them with new applications. Nevertheless, this primer can guide management scholars who wish to use data science techniques to reach better answers to existing questions or explore completely new research questions.

[1]  Vignesh Prajapati,et al.  Big Data Analytics with R and Hadoop , 2013 .

[2]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[3]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[4]  Leif D. Nelson,et al.  Specification Curve: Descriptive and Inferential Statistics on All Reasonable Specifications , 2015 .

[5]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[6]  E PaterdeI.,et al.  Psychological and physiological reactions to high workloads: implications for well-being , 2010 .

[7]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[8]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[9]  N. Yee,et al.  The Digital Workforce and the Workplace of the Future , 2016 .

[10]  Catherine E. Tucker,et al.  When Does Retargeting Work? Information Specificity in Online Advertising , 2013 .

[11]  G. George,et al.  Managing by Design , 2015 .

[12]  Gerard J. Tellis,et al.  Does Chatter Really Matter? Dynamics of User-Generated Content and Stock Performance , 2011, Mark. Sci..

[13]  Thorsten Wiesel,et al.  Practice Prize Paper - Marketing's Profit Impact: Quantifying Online and Off-line Funnel Progression , 2011, Mark. Sci..

[14]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[15]  Gregory J. Park,et al.  Automatic personality assessment through social media language. , 2015, Journal of personality and social psychology.

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  W. J. Becker,et al.  Hot Buttons and Time Sinks: The Effects of Electronic Communication During Nonwork Time on Emotions and Work-Nonwork Conflict , 2015 .

[18]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[19]  M. Haas,et al.  Information, Attention, and Decision Making , 2015 .

[20]  Mike Y. Chen,et al.  Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web , 2001 .

[21]  Hal R. Varian,et al.  Big Data: New Tricks for Econometrics , 2014 .

[22]  R. Pieters,et al.  Emotion-Induced Engagement in Internet Video Advertisements , 2012 .

[23]  Andrew B. Whinston,et al.  Path to Purchase: A Mutually Exciting Point Process Model for Online Advertising and Conversion , 2012, Manag. Sci..

[24]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[25]  Roger Calantone,et al.  The Promise and Perils of Wearable Sensors in Organizational Research , 2017 .

[26]  Joyce E. Bono,et al.  Building Positive Resources: Effects of Positive Events and Positive Reflection on Work Stress and Health , 2013 .

[27]  Gerard George,et al.  Managing digital money , 2015 .

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  P. Essens,et al.  Managing Risk and Resilience , 2015 .

[30]  M. Wedel,et al.  Marketing Analytics for Data-Rich Environments , 2016 .

[31]  Roland T. Rust,et al.  My Mobile Music: An Adaptive Personalization System For Digital Audio Players , 2007 .

[32]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[33]  Ming-Hui Chen,et al.  Statistical methods and computing for big data. , 2015, Statistics and its interface.

[34]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[35]  Vignesh Prajapati Big data analytics with R and Hadoop : set up an integrated infrastructure of R and Hadoop to turn your data analytics into big data analytics , 2013 .

[36]  Alex Pentland,et al.  Big Data and Management , 2014 .

[37]  M. Haas,et al.  Which Problems to Solve? Online Knowledge Sharing and Attention Allocation in Organizations , 2014 .

[38]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[39]  David H. Reiley,et al.  Online ads and offline sales: measuring the effect of retail advertising via a controlled experiment on Yahoo! , 2014 .

[40]  R. Bucklin,et al.  Modeling Purchase Behavior at an E-Commerce Web Site: A Task-Completion Approach , 2004 .

[41]  Arvind Rangaswamy,et al.  Sampling Designs for Recovering Local and Global Characteristics of Social Networks , 2015 .