MERRA Analytic Services: Meeting the Big Data challenges of climate science through cloud-enabled Climate Analytics-as-a-Service

Abstract Climate science is a Big Data domain that is experiencing unprecedented growth. In our efforts to address the Big Data challenges of climate science, we are moving toward a notion of Climate Analytics-as-a-Service (CAaaS). We focus on analytics, because it is the knowledge gained from our interactions with Big Data that ultimately produce societal benefits. We focus on CAaaS because we believe it provides a useful way of thinking about the problem: a specialization of the concept of business process-as-a-service, which is an evolving extension of IaaS, PaaS, and SaaS enabled by Cloud Computing. Within this framework, Cloud Computing plays an important role; however, we see it as only one element in a constellation of capabilities that are essential to delivering climate analytics as a service. These elements are essential because in the aggregate they lead to generativity, a capacity for self-assembly that we feel is the key to solving many of the Big Data challenges in this domain. MERRA Analytic Services (MERRA/AS) is an example of cloud-enabled CAaaS built on this principle. MERRA/AS enables MapReduce analytics over NASA’s Modern-Era Retrospective Analysis for Research and Applications (MERRA) data collection. The MERRA reanalysis integrates observational data with numerical models to produce a global temporally and spatially consistent synthesis of 26 key climate variables. It represents a type of data product that is of growing importance to scientists doing climate change research and a wide range of decision support applications. MERRA/AS brings together the following generative elements in a full, end-to-end demonstration of CAaaS capabilities: (1) high-performance, data proximal analytics, (2) scalable data management, (3) software appliance virtualization, (4) adaptive analytics, and (5) a domain-harmonized API. The effectiveness of MERRA/AS has been demonstrated in several applications. In our experience, Cloud Computing lowers the barriers and risk to organizational change, fosters innovation and experimentation, facilitates technology transfer, and provides the agility required to meet our customers’ increasing and changing needs. Cloud Computing is providing a new tier in the data services stack that helps connect earthbound, enterprise-level data and computational resources to new customers and new mobility-driven applications and modes of work. For climate science, Cloud Computing’s capacity to engage communities in the construction of new capabilities is perhaps the most important link between Cloud Computing and Big Data.

[1]  William Webster,et al.  The NASA Center for Climate Simulation Data Management System , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[2]  P. N. Edwards A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming , 2010 .

[3]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[4]  John L. Schnase,et al.  Federated Observational and Simulation Data in the NASA Center for Climate Simulation Data Management System Project , 2011 .

[5]  Francine Berman,et al.  Got data?: a guide to data preservation in the information age , 2008, CACM.

[6]  Jiangfeng Wei,et al.  Where Does the Irrigation Water Go? An Estimate of the Contribution of Irrigation to Precipitation Using MERRA , 2013 .

[7]  Jerome H. Saltzer,et al.  End-to-end arguments in system design , 1984, TOCS.

[8]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[9]  Darrell M. West,et al.  Saving Money Through Cloud Computing , 2010 .

[10]  John L. Schnase,et al.  The Virtual Climate Data Server (vCDS): An iRODS-Based Data Management Software Appliance Supporting Climate Data Services and Virtualization-as-a-Service in the NASA Center for Climate Simulation , 2012 .

[11]  Carlos Maltzahn,et al.  SciHadoop: Array-based query processing in Hadoop , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  J. Zittrain The Future of the Internet , 2008 .

[13]  Thomas L. Clune,et al.  Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis , 2012 .

[14]  S. Schubert,et al.  MERRA: NASA’s Modern-Era Retrospective Analysis for Research and Applications , 2011 .

[15]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[16]  Vivek Kundra,et al.  Federal Cloud Computing Strategy , 2011 .

[17]  Ulf-Dietrich Reips,et al.  "Big Data" : big gaps of knowledge in the field of internet science , 2012 .

[18]  Glenn Tamkin Hadoop for High-Performance Climate Analytics: Use Cases and Lessons Learned , 2013 .

[19]  Ellen V. Wright Federal information technology , 2009 .

[20]  Steven Bamford,et al.  Citizen Science: Contributions to Astronomy Research , 2012, ArXiv.

[21]  M. Wendy Hennequin,et al.  The Future of the Internet and How to Stop It , 2011 .

[22]  Geoffrey C. Bowker,et al.  Information ecology: open system environment for data, memories, and knowing , 2007, Journal of Intelligent Information Systems.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.