Towards building a data-intensive index for big data computing - A case study of Remote Sensing data processing

With the recent advances in Remote Sensing (RS) techniques, continuous Earth Observation is generating tremendous volume of RS data. The proliferation of RS data is revolutionizing the way in which RS data are processed and understood. Data with higher dimensionality, as well as the increasing requirement for real-time processing capabilities, have also given rise to the challenging issue of "Data-Intensive (DI) Computing". However, how to properly identify and qualify the DI issue remains a significant problem that is worth exploring. DI computing is a complex issue. While the huge data volume may be one of the reasons for this, some other factors could also be important. In this paper, we propose an empirical model ( DI RS ) of DI index to estimate RS applications. DI RS here is a novel empirical model ( DI RS ) that could quantify the DI issues in RS data processing with a normalized DI index. Through experimental analysis of the typical algorithms across the whole RS data processing flow, we identify the key factors that affect the DI issues mostly. Finally, combined with the empirical knowledge of domain experts, we formulate DI RS model to describe the correlations between the key factors and DI index. By virtue of experimental validation on more selected RS applications, we have found that DI RS model is an easy but promising approach.

[1]  Jon Atli Benediktsson,et al.  Recent Advances in Techniques for Hyperspectral Image Processing , 2009 .

[2]  Kerstin Kleese van Dam,et al.  Challenges in Data Intensive Analysis at Scientific Experimental User Facilities , 2011 .

[3]  Masanobu Shimada,et al.  An overview of the JERS-1 SAR Global Boreal Forest Mapping (GBFM) project , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[4]  Sassan Saatchi,et al.  The Global Rain Forest Mapping Project JERS-1 radar mosaic of tropical Africa: development and product characterization aspects , 2000, IEEE Trans. Geosci. Remote. Sens..

[5]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[6]  Antonio J. Plaza,et al.  Special issue on architectures and techniques for real-time processing of remotely sensed images , 2009, Journal of Real-Time Image Processing.

[7]  Muhammad Ali Babar,et al.  Software Architecture Review: The State of Practice , 2009, Computer.

[8]  Ian Gorton Software Architecture Challenges for Data Intensive Computing , 2008, Seventh Working IEEE/IFIP Conference on Software Architecture (WICSA 2008).

[9]  Chein-I Chang,et al.  High Performance Computing in Remote Sensing , 2007, HiPC 2007.

[10]  Mario Cannataro,et al.  Parallel data intensive computing in scientific and commercial applications , 2002, Parallel Comput..

[11]  Fabio Checconi,et al.  Massive data analytics: The Graph 500 on IBM Blue Gene/Q , 2013, IBM J. Res. Dev..

[12]  Huadong Guo,et al.  Earth observation satellite data receiving, processing system and data sharing , 2012, Int. J. Digit. Earth.

[13]  Robert L. Grossman,et al.  Malstone: towards a benchmark for analytics on large data clouds , 2010, KDD '10.

[14]  Lizhe Wang,et al.  Generic Parallel Programming for Massive Remote Sensing Data Processing , 2012, 2012 IEEE International Conference on Cluster Computing.

[15]  Qiao Wang,et al.  Technical system design and construction of China's HJ-1 satellites , 2012, Int. J. Digit. Earth.

[16]  Jianwen Ai,et al.  Workflow Process Service Research Based on Cloud Computing Platform for Remote Sensing Quantitative Retrieval , 2012, 2012 2nd International Conference on Remote Sensing, Environment and Transportation Engineering.

[17]  Le Yu,et al.  FROM-GC: 30 m global cropland extent derived through multisource data integration , 2013, Int. J. Digit. Earth.

[18]  J. Townshend,et al.  A long-term Global LAnd Surface Satellite (GLASS) data-set for environmental studies , 2013 .

[19]  Jakob J. van Zyl Application of satellite remote sensing data to the monitoring of global resources , 2012 .

[20]  Henry F. Korth,et al.  Multithreaded architectures and the sort benchmark , 2005, DaMoN '05.

[21]  R. Bindschadler,et al.  The Landsat Image Mosaic of Antarctica , 2008 .

[22]  Julien Michel,et al.  Remote Sensing Processing: From Multicore to GPU , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[23]  Hui Liu,et al.  High performance linpack benchmark: a fault tolerant implementation without checkpointing , 2011, ICS '11.

[24]  Jawwad Shamsi,et al.  Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions , 2013, Journal of Grid Computing.

[25]  Liping Di,et al.  Building an on-demand web service system for Global Agricultural Drought Monitoring and Forecasting , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[26]  Albert Y. Zomaya,et al.  Task-Tree Based Large-Scale Mosaicking for Massive Remote Sensed Imageries with Dynamic DAG Scheduling , 2014, IEEE Transactions on Parallel and Distributed Systems.

[27]  Reagan Moore,et al.  Data-intensive computing and digital libraries , 1998, CACM.

[28]  Guo Huadong,et al.  Building up national Earth observing system in China , 2005 .

[29]  Derya Maktav,et al.  Foreword to the Special Issue on “Human Settlements: A Global Remote Sensing Challenge” , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.