Data Science of the Natural Environment: A Research Roadmap

Data science is the science of extracting meaning from potentially complex data. This is a fast moving field, drawing principles and techniques from a number of different disciplinary areas including computer science, statistics and complexity science. Data science is having a profound impact on a number of areas including commerce, health and smart cities. This paper argues that data science can have an equal if not greater impact in the area of earth and environmental sciences, offering a rich tapestry of new techniques to support both a deeper understanding of the natural environment in all its complexities, as well as the development of well-founded mitigation and adaptation strategies in the face of climate change. The paper argues that data science for the natural environment brings about new challenges for data science, particularly around complexity, spatial and temporal reasoning, and managing uncertainty. The paper also describes a case study in environmental data science which offers up insights into the promise of the area. The paper concludes with a research roadmap highlighting ten top challenges of environmental data science and also an invitation to become part of an international community working collaboratively on these problems.

[1]  Jonathan A. Tawn,et al.  Bivariate extreme value theory: Models and estimation , 1988 .

[2]  Keith Beven,et al.  The future of distributed models: model calibration and uncertainty prediction. , 1992 .

[3]  Janet E. Heffernan,et al.  Dependence Measures for Extreme Value Analyses , 1999 .

[4]  D. Greene,et al.  Energy efficiency and consumption — the rebound effect — a survey , 2000 .

[5]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[6]  Seyed Masoud Sadjadi,et al.  Composing adaptive software , 2004, Computer.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  B. Law,et al.  An improved analysis of forest carbon dynamics using data assimilation , 2005 .

[9]  Robert G. Raskin,et al.  Knowledge representation in the semantic web for Earth and environmental terminology (SWEET) , 2005, Comput. Geosci..

[10]  George Kuczera,et al.  Bayesian analysis of input uncertainty in hydrological modeling: 2. Application , 2006 .

[11]  Keith Beven,et al.  Towards integrated environmental models of everywhere: uncertainty, data and modelling as a learning process , 2007 .

[12]  Seon Ki Park,et al.  Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. III) , 2009 .

[13]  Jonathan A. Tawn,et al.  Modelling non‐stationary extremes with application to surface level ozone , 2009 .

[14]  Mary Shaw,et al.  Software Engineering for Self-Adaptive Systems: A Research Roadmap , 2009, Software Engineering for Self-Adaptive Systems.

[15]  D. Higdon,et al.  Accelerating Markov Chain Monte Carlo Simulation by Differential Evolution with Self-Adaptive Randomized Subspace Sampling , 2009 .

[16]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[17]  C. Cervato,et al.  How Geoscientists Think and Learn , 2009 .

[18]  A. Gelfand,et al.  Handbook of spatial statistics , 2010 .

[19]  Mark A. Goddard,et al.  Scaling up from gardens: biodiversity conservation in urban environments. , 2010, Trends in ecology & evolution.

[20]  Suraje Dessai,et al.  Robust adaptation to climate change , 2010 .

[21]  F. Müller,et al.  Ecosystem Services at the Landscape Scale: the Need for Integrative Approaches , 2010 .

[22]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[23]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[24]  K. Steffen,et al.  Recent warming in Greenland in a long-term instrumental (1881–2012) climatic context: I. Evaluation of surface air temperature records , 2012 .

[25]  A. Davison,et al.  Statistical Modeling of Spatial Extremes , 2012, 1208.3378.

[26]  Karl E. Taylor,et al.  An overview of CMIP5 and the experiment design , 2012 .

[27]  A. Jarvis,et al.  Climate–society feedbacks and the avoidance of dangerous climate change , 2012 .

[28]  Jonathan A. Tawn,et al.  Modelling the distribution of the cluster maxima of exceedances of subasymptotic thresholds , 2012 .

[29]  Amit P. Sheth,et al.  The SSN ontology of the W3C semantic sensor network incubator group , 2012, J. Web Semant..

[30]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[31]  Keith Beven,et al.  A guide to good practice in modeling semantics for authors and referees , 2013 .

[32]  Daniel F. Martin,et al.  Adaptive mesh, finite volume modeling of marine ice sheets , 2013, J. Comput. Phys..

[33]  Dan Cornford,et al.  Managing uncertainty in integrated environmental modelling: The UncertWeb framework , 2013, Environ. Model. Softw..

[34]  Mary C. Hill,et al.  Integrated environmental modeling: A vision and roadmap for the future , 2013, Environ. Model. Softw..

[35]  Krzysztof Janowicz,et al.  Linked Data, Big Data, and the 4th Paradigm , 2013, Semantic Web.

[36]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[37]  Tom Fawcett,et al.  Data Science and its Relationship to Big Data and Data-Driven Decision Making , 2013, Big Data.

[38]  G. Mann,et al.  Large contribution of natural aerosols to uncertainty in indirect forcing , 2013, Nature.

[39]  Edd Dumbill,et al.  Making Sense of Big Data , 2013, Big Data.

[40]  Gordon S. Blair,et al.  Experiences of using a hybrid cloud to construct an environmental virtual observatory , 2013, CloudDP '13.

[41]  Russell Lawley,et al.  Technology: Crowd-sourced soil data for Europe , 2013, Nature.

[42]  William Perrizo,et al.  Big Data Analytics in Bioinformatics and Healthcare , 2014 .

[43]  Jie Tan,et al.  Big Data Bioinformatics , 2014, Journal of cellular physiology.

[44]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[45]  Michael C. Dietze,et al.  The role of data assimilation in predictive ecology , 2014 .

[46]  P. Harrison,et al.  Cross-sectoral impacts of climate change and socio-economic change for multiple, European land- and water-based sectors , 2015, Climatic Change.

[47]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[48]  Keith Beven,et al.  The uncertainty cascade in model fusion , 2014 .

[49]  Han Li,et al.  Inferring air pollution by sniffing social media , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[50]  Andrew Binley,et al.  GLUE: 20 years on , 2014 .

[51]  Holger R. Maier,et al.  Integrating modelling and smart sensors for environmental and human health , 2015, Environ. Model. Softw..

[52]  K.,et al.  The Community Earth System Model (CESM) large ensemble project: a community resource for studying climate change in the presence of internal climate variability , 2015 .

[53]  I. Yucel,et al.  Calibration and evaluation of a flood forecasting system: Utility of numerical weather prediction model, data assimilation and satellite-based rainfall , 2015 .

[54]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[55]  Upmanu Lall,et al.  Machine Learning Methods for ENSO Analysis and Prediction , 2015 .

[56]  Steve Easterbrook,et al.  The software architecture of climate models: a graphical comparison of CMIP5 and EMICAR5 configurations , 2015 .

[57]  J. Wilkinson,et al.  Is the Arctic an economic time bomb?:integrated assessment models can help answer this question , 2015 .

[58]  Qunying Huang,et al.  Using Twitter for tasking remote-sensing data collection and damage assessment: 2013 Boulder flood case study , 2016 .

[59]  Paul M. Thompson,et al.  Phenological sensitivity to climate across taxa and trophic levels , 2016, Nature.

[60]  Reynold Xin,et al.  Apache Spark , 2016 .

[61]  R. Haines-Young,et al.  Routledge Handbook of Ecosystem Services , 2018 .

[62]  Keith Beven,et al.  Facets of uncertainty: epistemic uncertainty, non-stationarity, likelihood, hypothesis testing, and communication , 2016 .

[63]  A. Leeson,et al.  Seasonal evolution of supraglacial lakes on an East Antarctic outlet glacier , 2016 .

[64]  Pietro Perona,et al.  Unsupervised Discovery of El Nino Using Causal Feature Learning on Microlevel Climate Data , 2016, UAI.

[65]  John L. Schnase,et al.  MERRA Analytic Services: Meeting the Big Data challenges of climate science through cloud-enabled Climate Analytics-as-a-Service , 2013, Comput. Environ. Urban Syst..

[66]  X. Fettweis,et al.  Extreme temperature events on Greenland in observations and the MAR regional climate model , 2017 .

[67]  Dorit Hammerling,et al.  A Case Study Competition Among Methods for Analyzing Large Spatial Data , 2017, Journal of Agricultural, Biological and Environmental Statistics.

[68]  Gordon Blair,et al.  The Design and Deployment of an End-to-end IoT Infrastructure for the Natural Environment , 2019, Future Internet.

[69]  J. Doran,et al.  Data Assimilation , 2022 .