A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science

Global climate change and its impact on human life has become one of our era's greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story. This discrepancy stems from the complex nature of climate data as well as the scientific questions climate science brings forth. This article introduces a data science audience to the challenges and opportunities to mine large climate datasets, with an emphasis on the nuanced difference between mining climate data and traditional big data approaches. We focus on data, methods, and application challenges that must be addressed in order for big data to fulfill their promise with regard to climate science applications. More importantly, we highlight research showing that solely relying on traditional big data techniques results in dubious findings, and we instead propose a theory-guided data science paradigm that uses scientific theory to constrain both the big data techniques as well as the results-interpretation process to extract accurate insight from large climate data.

[1]  Andrew J. Majda,et al.  FUNDAMENTAL LIMITATIONS OF AD HOC LINEAR AND QUADRATIC MULTI-LEVEL REGRESSION MODELS FOR PHYSICAL SYSTEMS , 2012 .

[2]  R. Rosso,et al.  Wind control of storm‐triggered shallow landslides , 2007 .

[3]  Ashok N. Srivastava,et al.  Virtual sensors: using data mining techniques to efficiently estimate remote sensing spectra , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[4]  J. Randerson,et al.  Forecasting Fire Season Severity in South America Using Sea Surface Temperature Anomalies , 2011, Science.

[5]  D. Chelton,et al.  Global observations of nonlinear mesoscale eddies , 2011 .

[6]  W. M. Gray,et al.  The Recent Increase in Atlantic Hurricane Activity: Causes and Implications , 2001, Science.

[7]  E. Wood,et al.  Little change in global drought over the past 60 years , 2012, Nature.

[8]  James W. Hurrell,et al.  Climate Science for Serving Society: Research, Modeling and Prediction Priorities , 2013 .

[9]  A. Dai Increasing drought under global warming in observations and models , 2013 .

[10]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[11]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[12]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[13]  B. Santer,et al.  Statistical significance of climate sensitivity predictors obtained by data mining , 2014 .

[14]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[15]  Jurgen Kurths,et al.  Synchronization in complex networks , 2008, 0805.2976.

[16]  Norbert Marwan,et al.  The backbone of the climate network , 2009, 1002.2100.

[17]  D. Chelton,et al.  The Influence of Nonlinear Mesoscale Eddies on Near-Surface Oceanic Chlorophyll , 2011, Science.

[18]  A. Haines Climate change 2001: the scientific basis. Contribution of Working Group 1 to the Third Assessment report of the Intergovernmental Panel on Climate Change [Book review] , 2003 .

[19]  Kevin E. Trenberth,et al.  Climate Data Guide Spurs Discovery and Understanding , 2013 .

[20]  Bruce A. Wielicki,et al.  Numerical Terradynamic Simulation Group 2011 Challenges of a Sustained Climate Observing System , 2018 .

[21]  Adam S. Phillips,et al.  Seasonal aspects of the recent pause in surface warming , 2014 .

[22]  Pat Langley,et al.  The changing science of machine learning , 2011, Machine Learning.

[23]  P. Webster,et al.  Heightened tropical cyclone activity in the North Atlantic: natural variability or climate trend? , 2006, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[24]  Paul J. Roebber,et al.  The architecture of the climate network , 2004 .

[25]  J. Overpeck,et al.  Climate Data Challenges in the 21st Century , 2011, Science.

[26]  Andrew J. Majda,et al.  Physics constrained nonlinear regression models for time series , 2012 .