Parallel SECONDO: Practical and efficient mobility data processing in the cloud

This paper presents a hybrid parallel processing system, named Parallel Secondo. It combines the Hadoop framework and a set of single-computer Secondo databases, in order to introduce the mobility data procedures into the parallel processing community, and vice versa. The system keeps the front-end and the executable language of Secondo to allow the users to state their parallel queries like common sequential queries. Besides, a set of auxiliary scripts is provided so as to make it easier to manage the system no matter how large the underlying cluster is, and keep the Hadoop platform as a transparent level of the system. Further, a parallel data model is also proposed in this paper to encapsulate all available Secondo data types and operators. Thereby, it is able to transform any Secondo sequential query to its corresponding parallel expression. For instance, all example queries in the moving objects database benchmark BerlinMOD are transformed, and two of them are demonstrated in this paper. In the last evaluations, this paper illustrates that Parallel Secondo is not only a practical but also an efficient system. For queries involving large amounts of data, it performs both linear speed-up and scale-up.

[1]  Ralf Hartmut Güting,et al.  A data model and data structures for moving objects databases , 2000, SIGMOD '00.

[2]  Ralf Hartmut Güting,et al.  Plug and play with query algebras: SECONDO, a generic DBMS development environment , 2000, Proceedings 2000 International Database Engineering and Applications Symposium (Cat. No.PR00789).

[3]  Ralf Hartmut Güting,et al.  SECONDO: A Platform for Moving Objects Database Research and for Publishing and Integrating Research Implementations , 2010, IEEE Data Eng. Bull..

[4]  Ralf Hartmut Güting,et al.  SECONDO: an extensible DBMS platform for research prototyping and teaching , 2005, 21st International Conference on Data Engineering (ICDE'05).

[5]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[6]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[7]  Xun Wang,et al.  Behavioral simulations in MapReduce , 2010, Proc. VLDB Endow..

[8]  Ralf Hartmut Güting,et al.  Modeling and querying moving objects in networks , 2006, The VLDB Journal.

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Markus Schneider,et al.  A foundation for representing and querying moving objects , 2000, TODS.

[11]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[12]  Ralf Hartmut Güting,et al.  BerlinMOD: a benchmark for moving object databases , 2009, The VLDB Journal.

[13]  Ralf Hartmut Güting,et al.  Algorithms for Moving Objects Databases , 2003, Comput. J..

[14]  Zhiyong Xu,et al.  SJMR: Parallelizing spatial join with MapReduce on clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[15]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..