ATLAS: A Distributed File System for Spatiotemporal Data

A majority of the data generated in several domains is geotagged. These data also have a chronological component associated with them. Pervasive data generation and collection efforts have led to an increase in data volumes. These data hold the potential to unlock valuable insights. To facilitate such knowledge extraction in a timely manner, the underlying file system must satisfy several objectives. In this study, we present Atlas, a distributed file system designed specifically for spatiotemporal data. Atlas includes several capabilities that are suited for performing large-scale analyses: aligning dispersion with data access patterns, load balancing storage, and facilitating interoperation with analytical engines such as Hadoop and Spark. Our empirical benchmarks profile several aspects of Atlas, and demonstrate the suitability of our methodology.

[1]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[2]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[3]  S. Mahalakshmi,et al.  Storing and Indexing Spatial Data in P2P Systems , 2011 .

[4]  Ahmed Eldawy,et al.  SpatialHadoop: A MapReduce framework for spatial data , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[5]  Guihai Chen,et al.  Towards Parallel Spatial Query Processing for Big Spatial Data , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[6]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[7]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[8]  Hanan Samet,et al.  Using a distributed quadtree index in peer-to-peer networks , 2007, The VLDB Journal.

[9]  Ralf Hartmut Güting,et al.  Parallel Secondo: Boosting Database Engines with Hadoop , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[10]  Xuan Song,et al.  Accelerating Spatial Data Processing with MapReduce , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.

[11]  Yogesh L. Simmhan,et al.  Data Management in Dynamic Environment-driven Computational Science , 2007, Grid-Based Problem Solving Environments.

[12]  Aoying Zhou,et al.  Query processing of massive trajectory data based on mapreduce , 2009, CloudDB@CIKM.

[13]  Huajun Chen,et al.  HBaseSpatial: A Scalable Spatial Data Storage Based on HBase , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[14]  Geoffrey C. Fox,et al.  Message-based cellular peer-to-peer grids: foundations for secure federation and autonomic services , 2005, Future Gener. Comput. Syst..

[15]  Eleni Stroulia,et al.  HGrid: A Data Model for Large Geospatial Data Sets in HBase , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[16]  Farnoush Banaei Kashani,et al.  Voronoi-Based Geospatial Query Processing with MapReduce , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[17]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[18]  Ahmed Eldawy,et al.  CG_Hadoop: computational geometry in MapReduce , 2013, SIGSPATIAL/GIS.

[19]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[20]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[21]  Anirban Mondal,et al.  P2PR-Tree: An R-Tree-Based Spatial Index for Peer-to-Peer Environments , 2004, EDBT Workshops.

[22]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[23]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[24]  Ahmed Eldawy,et al.  Pigeon: A spatial MapReduce language , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[25]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[26]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[27]  M. N. Vora,et al.  Hadoop-HBase for large-scale data , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[28]  Kai Wang,et al.  Spatial Queries Evaluation with MapReduce , 2009, 2009 Eighth International Conference on Grid and Cooperative Computing.

[29]  Zhiyong Xu,et al.  SJMR: Parallelizing spatial join with MapReduce on clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[30]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[31]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[32]  Geoffrey C. Fox,et al.  Towards enabling peer‐to‐peer Grids , 2005, Concurr. Pract. Exp..

[33]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.