The role of administrative data in the big data revolution in social science research.

The term big data is currently a buzzword in social science, however its precise meaning is ambiguous. In this paper we focus on administrative data which is a distinctive form of big data. Exciting new opportunities for social science research will be afforded by new administrative data resources, but these are currently under appreciated by the research community. The central aim of this paper is to discuss the challenges associated with administrative data. We emphasise that it is critical for researchers to carefully consider how administrative data has been produced. We conclude that administrative datasets have the potential to contribute to the development of high-quality and impactful social science research, and should not be overlooked in the emerging field of big data.

[1]  Christopher Winship,et al.  THE ESTIMATION OF CAUSAL EFFECTS FROM OBSERVATIONAL DATA , 1999 .

[2]  Jonathan Burton,et al.  Measuring ethnicity: challenges and opportunities for survey research , 2010 .

[3]  Rob Kitchin,et al.  The data revolution : big data, open data, data infrastructures & their consequences , 2014 .

[4]  L. Manovich,et al.  Trending: The Promises and the Challenges of Big Social Data , 2012 .

[5]  A. J. Bass,et al.  A decade of data linkage in Western Australia: strategic design, applications and benefits of the WA data linkage system. , 2008, Australian health review : a publication of the Australian Hospital Association.

[6]  L. Stinson,et al.  Income Measurement Error in Surveys: A Review , 2000 .

[7]  HENRY A. MESS Scientific Study of Society , 1942, Nature.

[8]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[9]  Fritz Scheuren,et al.  Regression Analysis of Data Files that Are Computer Matched , 1993 .

[10]  R. Gomm Social Research Methodology , 2008 .

[11]  Thad Dunning,et al.  Natural Experiments in the Social Sciences , 2012 .

[12]  Galit Shmueli,et al.  Research Commentary - Too Big to Fail: Large Samples and the p-Value Problem , 2013, Inf. Syst. Res..

[13]  J. Fortenberry,et al.  Comparability of a computer-assisted versus written method for collecting health behavior information from adolescent patients. , 1999, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[14]  Itzhak Benenson,et al.  The Data Revolution: Big Data, Open Data, Data Infrastructures and their Consequences. By Rob Kitchin, London: Sage, 2014. , 2016 .

[15]  Jane Elliott,et al.  Exploring Data: An Introduction to Data Analysis for Social Scientists, 2nd Edition , 1988 .

[16]  D. D. Vaus,et al.  Surveys in Social Research , 1991 .

[17]  Ralph Schroeder,et al.  Big Data, Ethics, and the Social Implications of Knowledge Production , 2014 .

[18]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[19]  Michael N. Mitchell Data Management Using Stata: A Practical Handbook , 2010 .

[20]  Vernon Gayle,et al.  Statistical modelling of key variables in social survey data analysis , 2016 .

[21]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[22]  N Black,et al.  Why we need qualitative research. , 1994, Journal of epidemiology and community health.

[23]  Peter Shepherd,et al.  Cohort profile: 1970 British Birth Cohort (BCS70). , 2006, International journal of epidemiology.

[24]  Charles E. McCulloch,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models , 2005 .

[25]  Jeremy Freese,et al.  Replication Standards for Quantitative Social Science , 2007 .

[26]  Donald T. Campbell,et al.  Assessing the Impact of Planned Social Change* , 2010, Journal of MultiDisciplinary Evaluation.

[27]  Li-Chun Zhang,et al.  Topics of statistical theory for register‐based statistics and data integration , 2012 .

[28]  Richard Goldstein,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival and Repeated Measures Models , 2006, Technometrics.

[29]  Robert M. Goerge,et al.  Matching and cleaning administrative data , 2002 .

[30]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[31]  T. Harford,et al.  Big data: A big mistake? , 2014 .

[32]  D. Rubin,et al.  A method for calibrating false-match rates in record linkage , 1995 .

[33]  Douglas H. Johnson The Insignificance of Statistical Significance Testing , 1999 .

[34]  Roger Burrows,et al.  After the crisis? Big Data and the methodological challenges of empirical sociology , 2014 .

[35]  Charles Goodhart,et al.  Monetary Theory and Practice: The UK Experience , 1983 .

[36]  Edward E. Leamer,et al.  Specification Searches: Ad Hoc Inference with Nonexperimental Data , 1980 .

[37]  Parminder Raina,et al.  Linking Canadian Population Health Data: Maximizing the Potential of Cohort and Administrative Data , 2013, Canadian Journal of Public Health.

[38]  Frank J. Ohlhorst Big Data Analytics: Turning Big Data into Big Money , 2012 .

[39]  Paul Mizen,et al.  Goodhart’s Law: Its Origins, Meaning and Implications for Monetary Policy , 2001 .

[40]  Ralph Schroeder,et al.  Big Data and the brave new world of social media research , 2014, Big Data Soc..

[41]  Michael Eisenstein,et al.  Big data: The power of petabytes , 2015, Nature.

[42]  Jonathan Gorard,et al.  What to do instead of significance testing? Calculating the ‘number of counterfactual cases needed to disturb a finding’ , 2016 .

[43]  Anna Vignoles,et al.  Comparing sample survey measures of English earnings of graduates with administrative data during the Great Recession , 2015 .

[44]  C. McCulloch,et al.  Generalized Linear Mixed Models , 2005 .

[45]  David Card,et al.  Expanding Access to Administrative Data for Research in the United States , 2010 .

[46]  Robert M. Goerge Special-education experiences of foster children: An empirical study. , 1992 .

[47]  J. Guzmán Regression Models for Categorical Dependent Variables Using Stata , 2013 .

[48]  Robert P. Gephart,et al.  Ethnostatistics and Organizational Research Methodologies , 2006 .

[49]  Andrew Fearne,et al.  Analyzing the Impact of Supermarket Promotions: A Case Study Using Tesco Clubcard Data in the UK , 2015 .

[50]  Robert J. Sampson,et al.  Ecometrics in the Age of Big Data , 2015 .

[51]  Mike Lowry,et al.  Social Research Methodology: A Critical Introduction , 2004 .

[52]  S. Nichols,et al.  The Inevitable Corruption of Indicators and Educators through High-Stakes Testing. Executive Summary. , 2005 .

[53]  Matthew Woollard,et al.  Administrative Data: Problems and Benefits. A perspective from the United Kingdom , 2014 .

[54]  Jeffrey A. Groen,et al.  Sources of Error in Survey and Administrative Data , 2012 .

[55]  Christopher M. Danforth,et al.  Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter , 2011, PloS one.

[56]  J. S. Long,et al.  The Workflow of Data Analysis Using Stata , 2008 .

[57]  Lewis Elton,et al.  Goodhart's Law and Performance Indicators in Higher Education , 2004 .

[58]  C. Manski Identification of Endogenous Social Effects: The Reflection Problem , 1993 .

[59]  Ralph Schroeder,et al.  UvA-DARE ( Digital Academic Repository ) Emerging practices and perspectives on Big Data analysis in economics : Bigger and better or more of the same ? , 2014 .

[60]  Angela Dale,et al.  Quality Issues with Survey Research , 2006 .

[61]  R. Kitchin,et al.  Big Data, new epistemologies and paradigm shifts , 2014, Big Data Soc..

[62]  Roxanne Connelly,et al.  Cohort profile: UK Millennium Cohort Study (MCS). , 2014, International journal of epidemiology.

[63]  R. P. Carver The Case Against Statistical Significance Testing , 1978 .

[64]  Patrizio Piraino,et al.  Immigrant Earnings Growth: Selection Bias or Real Progress? , 2012 .

[65]  P. Lahiri,et al.  Regression Analysis With Linked Data , 2005 .

[66]  F. Zwart,et al.  Pitfalls of top-down identity designation: Ethno-statistics in the Netherlands , 2012 .

[67]  C. Power,et al.  Cohort profile: 1958 British birth cohort (National Child Development Study). , 2006, International journal of epidemiology.

[68]  D. Kuh,et al.  Cohort Profile: The 1946 National Birth Cohort (MRC National Survey of Health and Development). , 2006, International journal of epidemiology.

[69]  Richard B. Davies,et al.  Analyzing social and political change : a casebook of methods , 1994 .

[70]  Anton Kühberger,et al.  Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size , 2014, PloS one.

[71]  S. Halford,et al.  Big Data: Methodological Challenges and Approaches for Sociological Analysis , 2014 .

[72]  John N. Friedman,et al.  How Does Your Kindergarten Classroom Affect Your Earnings? Evidence from Project Star , 2010, The quarterly journal of economics.

[73]  Graeme Laurie,et al.  The Administrative Data Research Centre Scotland: A Scoping Report on the Legal & Ethical Issues Arising from Access & Linkage of Administrative Data , 2014 .

[74]  Stephen Gorard Rethinking ‘quantitative’ methods and the development of new researchers , 2015 .

[75]  Raj Chetty,et al.  Is the United States Still a Land of Opportunity? Recent Trends in Intergenerational Mobility , 2014 .

[76]  Jonathan Levin,et al.  The Data Revolution and Economic Analysis , 2013, Innovation Policy and the Economy.

[77]  Raj Chetty,et al.  The Long-Term Impacts of Teachers: Teacher Value-Added and Student Outcomes in Adulthood , 2011 .

[78]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[79]  W. Bennett,et al.  Social Media and the Organization of Collective Action: Using Twitter to Explore the Ecologies of Two Climate Change Protests , 2011 .

[80]  D. Lilleker,et al.  Microblogging, Constituency Service and Impression Management: UK MPs and the Use of Twitter , 2011, The Impact of Legislatures.