On managing geospatial big-data in emergency management: some perspectives

With the rapid growth of mobile devices and applications, geo-tagged data is becoming increasingly important in emergency management and has become a major workload for big data storage systems. Traditional methods that storing geospatial data in centralized databases suffer from inevitable limitations such like scaling out with the growing size of geospatial data. In order to achieve scalability, a number of solutions on big geospatial data management are proposed in recent years. We can simply classify them into two kinds: extending on distributed databases, or migrating to big-data storage systems. For previous, they mostly adopt the massive parallel processing (MPP) based architecture, in which data are stored and retrieved in a set of independent nodes. Each node can be treated as a traditional databases instance with geospatial extension. For the latter, existing solutions tend to build an additional index layer above general-purpose distributed data stores, e.g., HBASE, CASSANDRA, MangoDB, etc., to support geospatial data while integrating the big-data lineage. However, there are no absolutely perfect data management systems on the earth. Some approaches are desired for execution efficiency while some others are better on fulfilling the programming level need for big data scenarios. In this paper, we analysis the requirements and challenges on geospatial big data storage in emergency management, succeed with discussion with individual perspective from practical cases. The purpose of this paper is not only focused on how to program a geospatial data storage platform but also on how to approve the rationality of geospatial big data system that we plan to build.

[1]  Jianfeng Zhan,et al.  Big Data Benchmarks, Performance Optimization, and Emerging Hardware , 2014, Lecture Notes in Computer Science.

[2]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[3]  Shen Li,et al.  Pyro: A Spatial-Temporal Big-Data Storage System , 2015, USENIX Annual Technical Conference.

[4]  David J. DeWitt,et al.  Can the Elephants Handle the NoSQL Onslaught? , 2012, Proc. VLDB Endow..

[5]  Xiaoyong Du,et al.  A Study of SQL-on-Hadoop Systems , 2014, BPOE@ASPLOS/VLDB.

[6]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[7]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Milind Bhandarkar,et al.  HAWQ: a massively parallel processing SQL engine in hadoop , 2014, SIGMOD Conference.

[10]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[11]  Fusheng Wang,et al.  High performance spatial queries for spatial big data: from medical imaging to GIS , 2015, SIGSPACIAL.

[12]  Willy Zwaenepoel,et al.  HadoopToSQL: a mapReduce query optimizer , 2010, EuroSys '10.

[13]  David J. DeWitt,et al.  Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation , 1997, SIGMOD '97.