Cross-system NoSQL data transformations with NotaQL

The rising adoption of NoSQL technology in enterprises causes a heterogeneous landscape of different data stores. Different stores provide distinct advantages and disadvantages, making it necessary for enterprises to facilitate multiple systems for specific purposes. This resulting polyglot persistence is difficult to handle for developers since some data needs to be replicated and aggregated between different and within the same stores. Currently, there are no uniform tools to perform these data transformations since all stores feature different APIs and data models. In this paper, we present the transformation language NotaQL that allows cross-system data transformations. These transformations are output-oriented, meaning that the structure of a transformation script is similar to that of the output. Besides, we provide an aggregation-centric approach, which makes aggregation operations as easy as possible.

[1]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[2]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[3]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[4]  Frank Leymann,et al.  Migrating Application Data to the Cloud using Cloud Data Patterns , 2013, CLOSER.

[5]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[6]  Giuseppe Castagna,et al.  Static and dynamic semantics of NoSQL languages , 2013, POPL.

[7]  Andrey Balmin,et al.  Jaql , 2011, Proc. VLDB Endow..

[8]  Patrick Valduriez,et al.  CloudMdsQL: querying heterogeneous cloud data stores with a common language , 2016, Distributed and Parallel Databases.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  Johannes Schildgen,et al.  NotaQL Is Not a Query Language! It's for Data Transformation on Wide-Column Stores , 2015, BICOD.

[11]  Yannis Papakonstantinou,et al.  The SQL++ Query Language: Configurable, Unifying and Semi-structured , 2014, 1405.3631.

[12]  Vanja Josifovski,et al.  SQL/MED: a status report , 2002, SGMD.

[13]  Laks V. S. Lakshmanan,et al.  SchemaSQL - A Language for Interoperability in Relational Multi-Database Systems , 1996, VLDB.

[14]  Norbert Ritter,et al.  Towards Automated Polyglot Persistence , 2015, BTW.

[15]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[16]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[17]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[18]  Meike Klettke,et al.  Managing Schema Evolution in NoSQL Data Stores , 2013, DBPL.

[19]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[20]  Anneli Folkesson,et al.  World Wide Web Consortium (W3C) , 2005 .

[21]  Jim Melton,et al.  SQL/XML and the SQLX Informal Group of Companies , 2001, SIGMOD Rec..

[22]  D UllmanJeffrey,et al.  Finding Interesting Associations without Support Pruning , 2001 .

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Edward L. Robertson,et al.  Relational languages for metadata integration , 2005, TODS.