Remote sensing big data computing: Challenges and opportunities

As we have entered an era of high resolution earth observation, the RS data are undergoing an explosive growth. The proliferation of data also give rise to the increasing complexity of RS data, like the diversity and higher dimensionality characteristic of the data. RS data are regarded as RS "Big Data". Fortunately, we are witness the coming technological leapfrogging. In this paper, we give a brief overview on the Big Data and data-intensive problems, including the analysis of RS Big Data, Big Data challenges, current techniques and works for processing RS Big Data. This paper identifies the properties and features of remote sensing big data.This paper reviews the stat-of-the-arts of remote sensing big data computing.This paper discusses the "data-intensive computing" issues in remote sensing big data processing.

[1]  M. N. Vora,et al.  Hadoop-HBase for large-scale data , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[2]  Hampapuram K. Ramapriyan,et al.  Evolution of the Earth Observing System (EOS) Data and Information System (EOSDIS) , 2006, 2006 IEEE International Symposium on Geoscience and Remote Sensing.

[3]  Masanobu Shimada,et al.  An overview of the JERS-1 SAR Global Boreal Forest Mapping (GBFM) project , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[4]  Albert Y. Zomaya,et al.  A Parallel File System with Application-Aware Data Layout Policies for Massive Remote Sensing Image Processing in Digital Earth , 2015, IEEE Transactions on Parallel and Distributed Systems.

[5]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[6]  Dhabaleswar K. Panda,et al.  Supporting efficient noncontiguous access in PVFS over Infiniband , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[7]  Peter Honeyman,et al.  Replication Control in Distributed File Systems , 2004 .

[8]  Albert Y. Zomaya,et al.  Task-Tree Based Large-Scale Mosaicking for Massive Remote Sensed Imageries with Dynamic DAG Scheduling , 2014, IEEE Transactions on Parallel and Distributed Systems.

[9]  Ian Gorton,et al.  The Changing Paradigm of Data-Intensive Computing , 2009, Computer.

[10]  Sumit Narayan,et al.  I/O characterization on a parallel file system , 2010, Proceedings of the 2010 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS '10).

[11]  Hideya Iwasaki,et al.  A Parallel Skeleton Library for Multi-core Clusters , 2009, 2009 International Conference on Parallel Processing.

[12]  Heiko A. Schmidt,et al.  DAGwoman: enabling DAGMan-like workflows on non-Condor platforms , 2012, SWEET '12.

[13]  Muhammad Ali Babar,et al.  Software Architecture Review: The State of Practice , 2009, Computer.

[14]  Matthew N. O. Sadiku,et al.  Cloud Computing: Opportunities and Challenges , 2014, IEEE Potentials.

[15]  Jon Atli Benediktsson,et al.  Recent Advances in Techniques for Hyperspectral Image Processing , 2009 .

[16]  Yu Fang,et al.  Applying GPU and POSIX thread technologies in massive remote sensing image data processing , 2011, 2011 19th International Conference on Geoinformatics.

[17]  Antonio J. Plaza,et al.  Recent Developments in High Performance Computing for Remote Sensing: A Review , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[18]  Reagan Moore,et al.  Data-intensive computing and digital libraries , 1998, CACM.

[19]  Rajeev Thakur,et al.  Evaluation of Collective I/O Implementations on Parallel Architectures , 2001, J. Parallel Distributed Comput..

[20]  Georg Hager,et al.  Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[21]  Shi Zhao-lian,et al.  Primary Study of Massive Imaging Auto-processing System “Pixel Factory” , 2006 .

[22]  Dhabaleswar K. Panda,et al.  High performance RDMA-based design of HDFS over InfiniBand , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[23]  Jim Gray,et al.  Distributed Computing Economics , 2004, ACM Queue.

[24]  Xuejun Yang,et al.  Tianhe-1A Interconnect and Message-Passing Services , 2012, IEEE Micro.

[25]  Yeh-Ching Chung,et al.  An efficient MPI-IO for noncontiguous data access over InfiniBand , 2005, 8th International Symposium on Parallel Architectures,Algorithms and Networks (ISPAN'05).

[26]  Katherine A. Yelick,et al.  Hybrid PGAS runtime support for multicore nodes , 2010, PGAS '10.

[27]  Jaroslav Pokorny NoSQL databases: a step to database scalability in web environment , 2011, iiWAS '11.

[28]  Liping Di,et al.  Building an on-demand web service system for Global Agricultural Drought Monitoring and Forecasting , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[29]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[30]  Yutaka Ishikawa,et al.  Multithreaded Two-Phase I/O: Improving Collective MPI-IO Performance on a Lustre File System , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[31]  Verdi March,et al.  Evaluation of a Performance Model of Lustre File System , 2010, 2010 Fifth Annual ChinaGrid Conference.

[32]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[33]  Frank D. Anger,et al.  Scheduling Precedence Graphs in Systems with Interprocessor Communication Times , 1989, SIAM J. Comput..

[34]  Hesham H. Ali,et al.  Task scheduling in parallel and distributed systems , 1994, Prentice Hall series in innovative technology.

[35]  Ishfaq Ahmad,et al.  Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[36]  Boontee Kruatrachue,et al.  Grain size determination for parallel processing , 1988, IEEE Software.

[37]  F. Sabins Remote Sensing: Principles and Interpretation , 1987 .

[38]  Zhifeng Xiao,et al.  Remote sensing image database based on NOSQL database , 2011, 2011 19th International Conference on Geoinformatics.

[39]  Ali R. Hurson,et al.  Scheduling and Load Balancing in Parallel and Distributed Systems , 1995 .

[40]  Kenjiro Taura,et al.  An Empirical Performance Study of Chapel Programming Language , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[41]  Norbert Pataki,et al.  Extension of iterator traits in the C++ Standard Template Library , 2011, 2011 Federated Conference on Computer Science and Information Systems (FedCSIS).

[42]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[43]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[44]  Samuel Lang,et al.  A Segment-Level Adaptive Data Layout Scheme for Improved Load Balance in Parallel File Systems , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[45]  Reshu Jain,et al.  GPFS-SNC: An enterprise cluster file system for Big Data , 2013, IBM J. Res. Dev..

[46]  Steven Swanson,et al.  Gordon: An Improved Architecture for Data-Intensive Applications , 2010, IEEE Micro.

[47]  Barbara M. Chapman,et al.  Filesystem Aware Scalable I/O Framework for Data-Intensive Parallel Applications , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[48]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[49]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[50]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[51]  Ian Gorton Software Architecture Challenges for Data Intensive Computing , 2008, Seventh Working IEEE/IFIP Conference on Software Architecture (WICSA 2008).

[52]  Walter B. Ligon,et al.  Scalable Distributed Directory Implementation on Orange File System , 2011 .

[53]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[54]  Derya Maktav,et al.  Foreword to the Special Issue on “Human Settlements: A Global Remote Sensing Challenge” , 2011, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[55]  Herbert Kuchen,et al.  Enhancing Muesli's Data Parallel Skeletons for Multi-core Computer Architectures , 2010, 2010 IEEE 12th International Conference on High Performance Computing and Communications (HPCC).

[56]  Antonio J. Plaza,et al.  Special issue on architectures and techniques for real-time processing of remotely sensed images , 2009, Journal of Real-Time Image Processing.

[57]  Katherine A. Yelick,et al.  Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[58]  Jean-Thierry Lapresté,et al.  Quaff: efficient C++ design for parallel skeletons , 2006, Parallel Comput..

[59]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[60]  Chao Yang,et al.  Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2 , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[61]  A. Kala Karun,et al.  A review on hadoop — HDFS infrastructure extensions , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[62]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[63]  Bill Dally,et al.  Power, Programmability, and Granularity: The Challenges of ExaScale Computing , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[64]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[65]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[66]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[67]  David Grove,et al.  X10 as a Parallel Language for Scientific Computation: Practice and Experience , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[68]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[69]  Alan D. George,et al.  Improving UPC productivity via integrated development tools , 2010, PGAS '10.

[70]  Jakob J. van Zyl Application of satellite remote sensing data to the monitoring of global resources , 2012 .

[71]  Dieter Kranzlmüller,et al.  Trends in Computation, Communication and Storage and the Consequences for Data-intensive Science , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[72]  Hal Finkel,et al.  The Universe at extreme scale: Multi-petaflop sky simulation on the BG/Q , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[73]  H. Ritzdorf,et al.  Fast Parallel Non-Contiguous File Access , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[74]  Fabrizio Petrini,et al.  Towards Fault Resilient Global Arrays , 2007, PARCO.

[75]  Hal Finkel,et al.  HACC , 2016, Commun. ACM.

[76]  Y.-S. Kee OpenMP extension to SMP clusters , 2006, IEEE Potentials.

[77]  Huadong Guo,et al.  Scientific big data and Digital Earth , 2014 .

[78]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[79]  Chein-I Chang,et al.  High Performance Computing in Remote Sensing , 2007, HiPC 2007.

[80]  Mitsuo Yokokawa The K Computer and its Application , 2012, 2012 Third International Conference on Networking and Computing.

[81]  Edward A. Lee,et al.  A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures , 1993, IEEE Trans. Parallel Distributed Syst..

[82]  Rajeev Thakur,et al.  A New Data Sieving Approach for High Performance I/O , 2012 .

[83]  Ronaldo dos Santos Mello,et al.  SimpleSQL: A Relational Layer for SimpleDB , 2012, ADBIS.

[84]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[85]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[86]  R. G. Pfister,et al.  New paradigm for search and order in EOSDIS , 2000, IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No.00CH37120).

[87]  Wei Zhang,et al.  Hydrological watersheds model researching based on digital elevation model , 2010, 2010 18th International Conference on Geoinformatics.