Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets

Today, people generate and store more data than ever before as they interact with both real and virtual environments. These digital traces of behavior and cognition offer cognitive scientists and psychologists an unprecedented opportunity to test theories outside the laboratory. Despite general excitement about big data and naturally occurring datasets among researchers, three “gaps” stand in the way of their wider adoption in theory-driven research: the imagination gap, the skills gap, and the culture gap. We outline an approach to bridging these three gaps while respecting our responsibilities to the public as participants in and consumers of the resulting research. To that end, we introduce Data on the Mind (http://www.dataonthemind.org), a community-focused initiative aimed at meeting the unprecedented challenges and opportunities of theory-driven research with big data and naturally occurring datasets. We argue that big data and naturally occurring datasets are most powerfully used to supplement—not supplant—traditional experimental paradigms in order to understand human behavior and cognition, and we highlight emerging ethical issues related to the collection, sharing, and use of these powerful datasets.

[1]  M. Jones Big Data in Cognitive Science , 2016 .

[2]  Joshua A.T. Fairfield,et al.  Big Data, Big Problems: Emerging Issues in the Ethics of Data Science and Journalism , 2014 .

[3]  Dirk Hovy,et al.  The Social Impact of Natural Language Processing , 2016, ACL.

[4]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[5]  Stuart J. Russell,et al.  Research Priorities for Robust and Beneficial Artificial Intelligence , 2015, AI Mag..

[6]  Rong Jin,et al.  Online feature selection for mining big data , 2012, BigMine '12.

[7]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[8]  Michael N. Jones Developing Cognitive Theory by Mining Large-Scale Naturalistic Data , 2017 .

[9]  J. Daniel Gezelter Open Source and Open Data Should Be Standard Practices. , 2015, The journal of physical chemistry letters.

[10]  David Ellsworth,et al.  Application-controlled demand paging for out-of-core visualization , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[11]  Devin G. Pope,et al.  Heuristic Thinking and Limited Attention in the Car Market , 2010 .

[12]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[13]  Ulrika Andersson Does media coverage of research misconduct impact on public trust in science? A study of news reporting and confidence in research in Sweden 2002–2013 , 2015, Observatorio (OBS*).

[14]  David Campbell,et al.  Don't forget people and specimens that make the database , 2008, Nature.

[15]  D. Zucker The Belmont Report , 2014 .

[16]  张谷 实验经济学(Experimental Economics)研究思路及成果应用简述 , 1994 .

[17]  Engin Bozdag,et al.  Staking out the unclear ethical terrain of online social experiments , 2014 .

[18]  David W. Louisell,et al.  National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research: Research on the Fetus , 1976 .

[19]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[20]  Adam R Ferguson,et al.  Big data from small data: data-sharing in the 'long tail' of neuroscience , 2014, Nature Neuroscience.

[21]  J. Stoker,et al.  The Department of Health and Human Services. , 1999, Home healthcare nurse.

[22]  S. Fiske,et al.  Protecting human research participants in the age of big data , 2014, Proceedings of the National Academy of Sciences.

[23]  A Special Report on Managing Information , 2022 .

[24]  Gary Lupyan,et al.  Discovering Psychological Principles by Mining Naturally Occurring Data Sets , 2016, Top. Cogn. Sci..

[25]  Julius Daugbjerg Bjerrekær,et al.  The OKCupid dataset: A very large public dataset of dating site users , 2016 .

[26]  Susan A. Speer,et al.  `Natural' and `contrived' data: a sustainable distinction? , 2002 .

[27]  Daniel Arribas-Bel,et al.  Accidental, open and everywhere: Emerging data sources for the understanding of cities , 2014 .

[28]  Naren Ramakrishnan,et al.  Privacy Risks in Recommender Systems , 2001, IEEE Internet Comput..

[29]  J. Manyika,et al.  Are you ready for the era of ‘big data’? , 2010 .

[30]  James E. Willis,et al.  Ethics, Big Data, and Analytics: A Model for Application , 2013 .

[31]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[32]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[33]  Roger D. Peng,et al.  The reproducibility crisis in science: A statistical counterattack , 2015 .

[34]  T. Stafford,et al.  Tracing the Trajectory of Skill Learning With a Very Large Sample of Online Game Players , 2014, Psychological science.

[35]  T. Griffiths Manifesto for a new (computational) cognitive revolution , 2015, Cognition.

[36]  Michael N. Jones,et al.  Decision contamination in the wild: Sequential dependencies in Yelp review ratings , 2016, CogSci.

[37]  R. Kelly Garrett,et al.  The Partisan Brain , 2015 .

[38]  Boris Bellalta,et al.  Public Open Sensor Data: Revolutionizing Smart Cities , 2013, IEEE Technology and Society Magazine.