Methodological Transparency and Big Data: A Critical Comparative Analysis of Institutionalization

Big data is increasingly employed in predictive social analyses, yet there are many visible instances of unreliable models or failure, raising questions about methodological validity in data driven approaches. From meta-analysis of methodological institutionalization across three scholarly disciplines, there is evidence that traditional statistical quantitative methods, which are more institutionalized and consistent, are important to develop, structure, and institutionalize data scientific approaches for new and large n quantitative methods, indicating that data driven research approaches may be limited in reliability, validity, generalizability, and interpretability. Results also indicate that interdisciplinary collaborations describe methods in significantly greater detail on projects employing big data, with the effect that institutionalization makes data science approaches more transparent.

[1]  B. Thompson What Future Quantitative Social Science Research Could Look Like: Confidence Intervals for Effect Sizes , 2002 .

[2]  L. Manovich,et al.  Trending: The Promises and the Challenges of Big Social Data , 2012 .

[3]  Z. Irani,et al.  Critical analysis of Big Data challenges and analytical methods , 2017 .

[4]  John P. A. Ioannidis,et al.  What does research reproducibility mean? , 2016, Science Translational Medicine.

[5]  R. Kitchin,et al.  Big Data, new epistemologies and paradigm shifts , 2014, Big Data Soc..

[6]  Jianhua Guo,et al.  A Bayesian feature selection paradigm for text classification , 2012, Inf. Process. Manag..

[7]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[8]  Tom Breur US elections: How could predictions be so wrong? , 2016 .

[9]  Janet Buttolph Johnson,et al.  Political Science Research Methods , 1986 .

[10]  A. Sayer Method in Social Science: Revised 2nd Edition , 2010 .

[11]  Isabelle Stadelmann‐Steffen,et al.  Cantonal variations of integration policy and their impact on immigrant educational inequality , 2013 .

[12]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  P. Whiteley Is the party over? The decline of party activism and membership across the democratic world , 2011 .

[14]  Lada A. Adamic,et al.  The Party Is Over Here: Structure and Content in the 2010 Election , 2011, ICWSM.

[15]  Michael Mattioli,et al.  Big data, bigger dilemmas: A critical review , 2015, J. Assoc. Inf. Sci. Technol..

[16]  P. Vakkari,et al.  Perceived outcomes of public libraries , 2012 .

[17]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[18]  Patrick F. Sullivan,et al.  Quantifying prion disease penetrance using large population control cohorts , 2016, Science Translational Medicine.

[19]  Roger Nett,et al.  A methodology for social research , 1968 .

[20]  Martin Frické,et al.  Big data and its epistemology , 2015, J. Assoc. Inf. Sci. Technol..

[21]  J. Stevens Applied Multivariate Statistics for the Social Sciences , 1986 .

[22]  Edward G. Carmines,et al.  Reliability and Validity Assessment , 1979 .

[23]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[24]  Jennifer E. Rowley,et al.  Factors affecting attitudes and intentions towards knowledge sharing in the Dubai Police Force , 2012, Int. J. Inf. Manag..

[25]  E. Ostrom,et al.  A Grammar of Institutions , 1995, American Political Science Review.

[26]  Kevin Driscoll,et al.  Big Data, Big Questions| Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data , 2014 .

[27]  Onur Tuncer,et al.  The effect of academic inbreeding on scientific effectiveness , 2011, Scientometrics.

[28]  V. Bacharach,et al.  Psychometrics : An Introduction , 2007 .

[29]  Sandra González-Bailón Social Science in the Era of Big Data , 2013 .

[30]  Carlos Delgado Kloos,et al.  Monitoring student progress using virtual appliances: A case study , 2012, Comput. Educ..

[31]  J. Hahm,et al.  The Big (Data) Bang: Policy, Prospects, and Challenges , 2014 .

[32]  R. Fidel Are we there yet?: Mixed methods research in library and information science , 2008 .

[33]  T. Jick Mixing Qualitative and Quantitative Methods: Triangulation in Action. , 1979 .

[34]  J. Alberto Espinosa,et al.  Big Data: Issues and Challenges Moving Forward , 2013, 2013 46th Hawaii International Conference on System Sciences.

[35]  C. Borgman,et al.  Scholarly Communication and Bibliometrics. , 1992 .

[36]  Linda S. Lotto Qualitative Data Analysis: A Sourcebook of New Methods , 1986 .

[37]  Yadira Espinal Viktor Mayer-Schonberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work and Think , 2013 .

[38]  Rob Kitchin,et al.  The data revolution : big data, open data, data infrastructures & their consequences , 2014 .

[39]  W. Weidlich Quantitative social science , 1987 .

[40]  E. Ruppert Doing the Transparent State: open government data asperformance indicators , 2015 .

[41]  A. Agresti,et al.  Statistical Methods for the Social Sciences , 1979 .

[42]  Cliff Lampe,et al.  Big Data in Survey Research AAPOR Task Force Report , 2015 .

[43]  Leo Egghe,et al.  Introduction to Informetrics: Quantitative Methods in Library, Documentation and Information Science , 1990 .

[44]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[45]  G. Watson Resistance to Change , 1971 .

[46]  Gail Herrera,et al.  Google Scholar Users and User Behaviors: An Exploratory Study , 2011, Coll. Res. Libr..