From Relations to Multi-dimensional Maps: Towards an SQL-to-HBase Transformation Methodology

In this paper, we describe a method for transforming and migrating data schemas developed for RDBMS to HBase. The method consists of a set of HBase-organization guidelines and a four-step data-schema transformation process that HBase application developers may follow during the migration of their application data from RDBMSs to HBase. The method also considers data-access paths extracted from query logs, in order to improve the quality of the transformation and the eventual access efficiency of the HBase repository. We illustrate and validate the method with a case study.

[1]  Chongxin Li,et al.  Transforming relational database into HBase: A case study , 2010, 2010 IEEE International Conference on Software Engineering and Service Sciences.

[2]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[3]  Wang-Chien Lee,et al.  Key Formulation Schemes for Spatial Index in Cloud Data Managements , 2012, 2012 IEEE 13th International Conference on Mobile Data Management.

[4]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[5]  Verena Kantere,et al.  Adaptive query execution for data management in the cloud , 2010, CloudDB '10.

[6]  Tilmann Rabl,et al.  Materialized views in Cassandra , 2014, CASCON.

[7]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[8]  Joseph Fong,et al.  Translating relational schema into XML schema definition with data semantic preservation and XSD graph , 2005, Inf. Softw. Technol..

[9]  Abdelsalam M. Maatuk Migrating relational databases into object-based and XML databases , 2009 .

[10]  Mukesh K. Mohania,et al.  Enabling Active Data Archival over Cloud , 2012, 2012 IEEE Ninth International Conference on Services Computing.

[11]  Eleni Stroulia,et al.  HGrid: A Data Model for Large Geospatial Data Sets in HBase , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[12]  David J. DeWitt,et al.  The BUCKY object-relational benchmark , 1997, SIGMOD '97.

[13]  Kenneth Mark Anderson,et al.  MySQL to NoSQL: data modeling challenges in supporting scalability , 2012, SPLASH '12.

[14]  Veda C. Storey,et al.  A Framework for the Design and Evaluation of Reverse Engineering Methods for Relational Databases , 1996, Data Knowl. Eng..

[15]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[16]  Eleni Stroulia,et al.  A three-dimensional data model in HBase for large time-series dataset analysis , 2012, 2012 IEEE 6th International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA).

[17]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.