Analyzing Big Data in Psychology: A Split/Analyze/Meta-Analyze Approach

Big data is a field that has traditionally been dominated by disciplines such as computer science and business, where mainly data-driven analyses have been performed. Psychology, a discipline in which a strong emphasis is placed on behavioral theories and empirical research, has the potential to contribute greatly to the big data movement. However, one challenge to psychologists—and probably the most crucial one—is that most researchers may not have the necessary programming and computational skills to analyze big data. In this study we argue that psychologists can also conduct big data research and that, rather than trying to acquire new programming and computational skills, they should focus on their strengths, such as performing psychometric analyses and testing theories using multivariate analyses to explain phenomena. We propose a split/analyze/meta-analyze approach that allows psychologists to easily analyze big data. Two real datasets are used to demonstrate the proposed procedures in R. A new research agenda related to the analysis of big data in psychology is outlined at the end of the study.

[1]  Scott Tonidandel,et al.  Big data at work : the data science revolution and organizational psychology , 2016 .

[2]  W. S. Robinson Ecological correlations and the behavior of individuals. , 1950, International journal of epidemiology.

[3]  Stephen W. Raudenbush,et al.  Analyzing effect sizes: Random-effects models. , 2009 .

[4]  Stephen G West,et al.  Doctoral training in statistics, measurement, and methodology in psychology: replication and extension of Aiken, West, Sechrest, and Reno's (1990) survey of PhD programs in North America. , 2008, The American psychologist.

[5]  T. Gladwin Culture's Consequences: International Differences in Work-Related Values , 1981 .

[6]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[7]  G. Cumming The New Statistics: Why and How , 2013 .

[8]  Mike W-L Cheung,et al.  Random‐effects models for meta‐analytic structural equation modeling: review, issues, and illustrations , 2016, Research synthesis methods.

[9]  Divesh Srivastava,et al.  Data quality: The other face of Big Data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[10]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[11]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[12]  Frank L Schmidt,et al.  Updating meta-analytic research findings: Bayesian approaches versus the medical model. , 2007, The Journal of applied psychology.

[13]  Dena A. Pastor,et al.  Using Mixed-Effects Models In Reliability Generalization Studies , 2003 .

[14]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[15]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[16]  I Olkin,et al.  Comparison of meta-analysis versus analysis of variance of individual patient data. , 1998, Biometrics.

[17]  G. A. Marcoulides Multilevel Analysis Techniques and Applications , 2002 .

[18]  Kait Clark,et al.  What can 1 billion trials tell us about visual search? , 2015, Journal of experimental psychology. Human perception and performance.

[19]  Mike W.-L. Cheung,et al.  Multivariate Meta-Analysis as Structural Equation Models , 2013 .

[20]  Vincent A. Knight,et al.  Tweeting the terror: modelling the social media reaction to the Woolwich terrorist attack , 2014, Social Network Analysis and Mining.

[21]  W. S. Robinson,et al.  Ecological correlations and the behavior of individuals. , 1950, International journal of epidemiology.

[22]  Alex Singleton,et al.  Putting big data in its place: a Regional Studies and Regional Science perspective , 2015 .

[23]  Frederick L. Oswald,et al.  Implications of the Big Data Movement for the Advancement of I-O Science and Practice , 2015 .

[24]  Frederick L. Oswald,et al.  Statistical Methods for Big Data: A Scenic Tour , 2015 .

[25]  Mike W-L Cheung,et al.  Fixed- and random-effects meta-analytic structural equation modeling: Examples and analyses in R , 2013, Behavior Research Methods.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[28]  G. vanRossum Python reference manual , 1995 .

[29]  Douglas G. Bonett,et al.  Varying coefficient meta-analytic methods for alpha reliability. , 2010, Psychological methods.

[30]  E. Hargittai Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites , 2015 .

[31]  Minge Xie,et al.  A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data , 2014 .

[32]  Francis X. Diebold,et al.  On the Origin(s) and Development of the Term 'Big Data' , 2012 .

[33]  Min‐ge Xie,et al.  A split-and-conquer approach for analysis of , 2014 .

[34]  C H Schmid,et al.  Cumulative meta-analysis of clinical trials builds evidence for exemplary medical care. , 1995, Journal of clinical epidemiology.

[35]  S. Thompson,et al.  Quantifying heterogeneity in a meta‐analysis , 2002, Statistics in medicine.

[36]  Steven Andrew Culpepper,et al.  R is for Revolution , 2011 .

[37]  R. Fazio,et al.  The Evaluative Lexicon: Adjective use as a means of assessing and distinguishing attitude valence, extremity, and emotionality , 2015 .

[38]  R. Procter,et al.  Reading the riots on Twitter: methodological innovation for the analysis of big data , 2013 .

[39]  S. Klinkenberg,et al.  Computer adaptive practice of Maths ability using a new item response model for on the fly ability and difficulty estimation , 2011, Comput. Educ..

[40]  S. Gosling,et al.  Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. , 2004, The American psychologist.

[41]  Matthew A. Russell,et al.  Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More , 2018 .

[42]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[43]  Peter Beike,et al.  Culture Leadership And Organizations The Globe Study Of 62 Societies , 2016 .

[44]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[45]  Mike W.-L. Cheung,et al.  Meta-Analysis: A Structural Equation Modeling Approach , 2015 .

[46]  Risto Lehtonen,et al.  Multilevel Statistical Models , 2005 .

[47]  Mike W-L Cheung,et al.  Meta-analytic structural equation modeling: a two-stage approach. , 2005, Psychological methods.

[48]  L. Edwards,et al.  Fixed and random effects models , 2012 .

[49]  M. Cheung Meta-Analytic Structural Equation Modeling , 2015, Oxford Research Encyclopedia of Business and Management.

[50]  L. Hedges,et al.  Fixed- and random-effects models in meta-analysis. , 1998 .

[51]  M. Suero,et al.  Psychometric inferences from a meta-analysis of reliability and internal consistency coefficients. , 2010, Psychological methods.

[52]  Victor C. M. Leung,et al.  Big Data: Related Technologies, Challenges and Future Prospects , 2014 .

[53]  Simon Munzert,et al.  Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining , 2014 .

[54]  Ton de Waal,et al.  Finding errors in Big Data , 2015 .