Environmental Data Science

Abstract Environmental data are growing in complexity, size, and resolution. Addressing the types of large, multidisciplinary problems faced by today's environmental scientists requires the ability to leverage available data and information to inform decision making. Successfully synthesizing heterogeneous data from multiple sources to support holistic analyses and extraction of new knowledge requires application of Data Science. In this paper, we present the origins and a brief history of Data Science. We revisit prior efforts to define Data Science and provide a more modern, working definition. We describe the new professional profile of a data scientist and new and emerging applications of Data Science within Environmental Sciences. We conclude with a discussion of current challenges for Environmental Data Science and suggest a path forward.

[1]  LiuYong,et al.  A virtual sensor system for user-generated, real-time environmental data products , 2011 .

[2]  Lluís Corominas,et al.  Transforming data into knowledge for improved wastewater treatment operation: A critical review of techniques , 2017, Environ. Model. Softw..

[3]  James W. Jones,et al.  Harmonization and translation of crop modeling data to ensure interoperability , 2014, Environ. Model. Softw..

[4]  Holger R. Maier,et al.  Integrating modelling and smart sensors for environmental and human health , 2015, Environ. Model. Softw..

[5]  K. Srinivasa Raju,et al.  OPTIMAL RESERVOIR OPERATION USING FUZZY APPROACH , 2001 .

[6]  Yehia El-khatib,et al.  Web technologies for environmental Big Data , 2015, Environ. Model. Softw..

[7]  Mikhail Kanevski,et al.  Machine Learning Algorithms for GeoSpatial Data. Applications and Software Tools , 2008 .

[8]  Kristian Zarb Adami,et al.  A Machine Learning approach for automatic land cover mapping from DSLR images over the Maltese Islands , 2018, Environ. Model. Softw..

[9]  Lorin M. Hitt,et al.  Strength in Numbers: How Does Data-Driven Decisionmaking Affect Firm Performance? , 2011, ICIS 2011.

[10]  Albert Bifet,et al.  MACHINE LEARNING FOR DATA STREAMS , 2018 .

[11]  Karina Gibert,et al.  Knowledge discovery with clustering based on rules by states: A water treatment application , 2010, Environ. Model. Softw..

[12]  David A. Swayne Applying computer research to environmental problems , 2003, Environ. Model. Softw..

[13]  Brandon P. Wong,et al.  Real-time environmental sensor data: An application to water quality using web services , 2016, Environ. Model. Softw..

[14]  U. Fayyad Knowledge Discovery and Data Mining: An Overview , 1995 .

[15]  Jason W. Karl,et al.  Integrating Remotely Sensed Imagery and Existing Multiscale Field Data to Derive Rangeland Indicators: Application of Bayesian Additive Regression Trees , 2017, Rangeland Ecology and Management.

[16]  George Strawn,et al.  Data Scientist , 2016, IT Professional.

[17]  Richard Fowles,et al.  Treed Avalanche Forecasting: Mitigating Avalanche Danger Utilizing Bayesian Additive Regression Trees , 2017 .

[18]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[19]  William S. Cleveland,et al.  Data science: An action plan for expanding the technical areas of the field of statistics , 2001, Stat. Anal. Data Min..

[20]  Heather Lea Moulaison,et al.  Digital Preservation for Libraries, Archives, and Museums , 2014 .

[21]  Miquel Sànchez-Marrè,et al.  A survey on pre-processing techniques: Relevant issues in the context of environmental data mining , 2016, AI Commun..

[22]  Uwe Schlink,et al.  A framework to interpret passively sampled indoor-air VOC concentrations in health studies , 2016 .

[23]  J. Tukey The Future of Data Analysis , 1962 .

[24]  Thomas J. Steenburgh,et al.  Motivating Salespeople: What Really Works , 2012, Harvard business review.

[25]  Rodney Anthony Stewart,et al.  Smart meter enabled water end-use demand data: platform for the enhanced infrastructure planning of contemporary urban water supply networks , 2015 .

[26]  Rafael Pino-Mejías,et al.  Finite mixture models to characterize and refine air quality monitoring networks. , 2014, The Science of the total environment.

[27]  James Myers,et al.  A virtual sensor system for user-generated, real-time environmental data products , 2011, Environ. Model. Softw..

[28]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[29]  Pericles A. Mitkas,et al.  Knowledge Discovery for Operational Decision Support in Air Quality Management , 2007 .

[30]  Mathew H. Evans,et al.  Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results , 2018, Advances in Methods and Practices in Psychological Science.

[31]  I. Yeoman Competing on analytics: The new science of winning , 2009 .

[32]  Pierre Karrasch,et al.  Design and prototype of an interoperable online air quality information system , 2016, Environ. Model. Softw..

[33]  Andrea Castelletti,et al.  Benefits and challenges of using smart meters for advancing residential water demand modeling and management: A review , 2015, Environ. Model. Softw..

[34]  Abbas Afshar,et al.  State of the Art Review of Ant Colony Optimization Applications in Water Resource Management , 2015, Water Resources Management.

[35]  Jeffery S. Horsburgh,et al.  Measuring water use, conservation, and differences by gender using an inexpensive, high frequency metering system , 2017, Environ. Model. Softw..

[36]  Ioannis N. Athanasiadis,et al.  Privacy-preserving computation of participatory noise maps in the cloud , 2014, J. Syst. Softw..

[37]  Vasant Dhar,et al.  Editorial - Big Data, Data Science, and Analytics: The Opportunity and Challenge for IS Research , 2014, Inf. Syst. Res..

[38]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[39]  Miquel Sànchez-Marrè,et al.  Choosing the Right Data Mining Technique: Classification of Methods and Intelligent Recommendation , 2010 .

[40]  Anthony Jakeman Environmental modelling, software and decision support : state of the art and new perspectives , 2008 .

[41]  Kristin Condotta Digital Preservation for Libraries, Archives, & Museums , 2015 .

[42]  S. Tilak,et al.  The Movebank data model for animal tracking , 2011, Environ. Model. Softw..

[43]  K. Srinivasa Raju,et al.  Optimal Reservoir Operation for Irrigation of Multiple Crops Using Genetic Algorithms , 2006 .

[44]  William S. Cleveland Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics , 2001 .

[45]  Marvin N. Wright,et al.  SoilGrids250m: Global gridded soil information based on machine learning , 2017, PloS one.

[46]  Mac McKee,et al.  Estimating chlorophyll with thermal and broadband multispectral high resolution imagery from an unmanned aerial system using relevance vector machines for precision agriculture , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[47]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[48]  Katherine A. Klise,et al.  A software framework for assessing the resilience of drinking water systems to disasters with an example earthquake case study , 2017, Environ. Model. Softw..

[49]  Karim C. Abbaspour,et al.  A toolkit for climate change analysis and pattern recognition for extreme weather conditions - Case study: California-Baja California Peninsula , 2017, Environ. Model. Softw..

[50]  Stefano Nativi,et al.  Big Data challenges in building the Global Earth Observation System of Systems , 2015, Environ. Model. Softw..

[51]  Josep Lluís de la Rosa i Esteva,et al.  Agents for Social Search in Long-Term Digital Preservation , 2010, 2010 Sixth International Conference on Semantics, Knowledge and Grids.

[52]  D. Butler Data, data everywhere... , 2005, Nature Structural &Molecular Biology.

[53]  Bo Thiesson,et al.  Selecting Models from Data : AI and statistics IV , 1995 .

[54]  Marc Boulon,et al.  Soil parameter identification using a genetic algorithm , 2008 .

[55]  Paul Conway,et al.  Preservation in the Age of Google: Digitization, Digital Preservation, and Dilemmas1 , 2010, The Library Quarterly.

[56]  J. Verrelst,et al.  Mapping a priori defined plant associations using remotely sensed vegetation characteristics , 2014 .