Big Data in Climate: Opportunities and Challenges for Machine Learning

The climate and Earth sciences have recently undergone a rapid transformation from a data-poor to a data-rich environment. In particular, massive amount of data about Earth and its environment is now continuously being generated by a large number of Earth observing satellites as well as physics-based earth system models running on large-scale computational platforms. These massive and information-rich datasets offer huge potential for understanding how the Earth's climate and ecosystem have been changing and how they are being impacted by humans actions. We discuss the challenges involved in analyzing these massive data sets as well as opportunities they present for both advancing machine learning as well as the science of climate change.

[1]  Vipin Kumar,et al.  Learning large-scale plantation mapping from imperfect annotators , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[2]  B. Santer,et al.  Statistical significance of climate sensitivity predictors obtained by data mining , 2014 .

[3]  Anuj Karpatne,et al.  Tripoles: A New Class of Relationships in Time Series Data , 2017, KDD.

[4]  A. Ganguly,et al.  Lack of uniform trends but increasing spatial variability in observed Indian rainfall extremes , 2012 .

[5]  Xi Chen,et al.  Global Monitoring of Inland Water Dynamics: State-of-the-Art, Challenges, and Opportunities , 2016, Computational Sustainability.

[6]  Vipin Kumar,et al.  RAPT: Rare Class Prediction in Absence of True Labels , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Vipin Kumar,et al.  Clustering Dynamic Spatio-Temporal Patterns in The Presence of Noise and Missing Data , 2015, IJCAI.

[8]  Xiaowei Jia,et al.  Incremental Dual-memory LSTM in Land Cover Prediction , 2017, KDD.

[9]  Nagiza F. Samatova,et al.  Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Vipin Kumar,et al.  Exploring the Predictability of 30-Day Extreme Precipitation Occurrence Using a Global SST–SLP Correlation Network , 2016 .

[11]  N. Samatova,et al.  Different Modes of Variability over the Tasman Sea: Implications for Regional Climate* , 2014 .

[12]  Anuj Karpatne,et al.  Adaptive Heterogeneous Ensemble Learning Using the Context of Test Instances , 2015, 2015 IEEE International Conference on Data Mining.

[13]  Nagiza F. Samatova,et al.  Theory-Guided Data Science for Climate Change , 2014, Computer.

[14]  Vipin Kumar,et al.  Predict Land Covers with Transition Modeling and Incremental Learning , 2017, SDM.

[15]  Vipin Kumar,et al.  A Teleconnection between the West Siberian Plain and the ENSO Region , 2017 .

[16]  Nagiza F. Samatova,et al.  A graph‐based approach to find teleconnections in climate data , 2013, Stat. Anal. Data Min..

[17]  Anuj Karpatne,et al.  Ensemble Learning Methods for Binary Classification with Multi-modality within the Classes , 2015, SDM.

[18]  A. Karpatne,et al.  An approach for global monitoring of surface water extent variations in reservoirs using MODIS data , 2017 .

[19]  Vipin Kumar,et al.  Post Classification Label Refinement Using Implicit Ordering Constraint Among Data Instances , 2015, 2015 IEEE International Conference on Data Mining.

[20]  James H. Faghmous,et al.  A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science , 2014, Big Data.