XQuery processing over NoSQL stores

Using NoSQL stores as storage layer for the execution of declarative query processing using XQuery provides a highlevel interface to process data in an optimized manner. The term NoSQL refers to a plethora of new stores which essentially trades o well-known ACID properties for higher availability or scalability, using techniques such as eventual consistency, horizontal scalability, ecient replication, and schema-less data models. This work proposes a mapping from the data model of dierent kinds of NoSQL stores| key/value, columnar, and document-oriented|to the XDM data model, thus allowing for standardization and querying NoSQL data using higher-level languages, such as XQuery. This work also explores several optimization scenarios to improve performance on top of these stores. Besides, we also add updating semantics to XQuery by introducing simple CRUD-enabling functionalities. Finally, this work analyzes the performance of the system in several scenarios.

[1]  Caetano Sauer XQuery Processing in the MapReduce Framework , 2012 .

[2]  Rusty Klophaus,et al.  Riak Core: building distributed applications without shared state , 2010, CUFP '10.

[3]  E. Brewer,et al.  CAP twelve years later: How the "rules" have changed , 2012, Computer.

[4]  Francesc Alted,et al.  Why Modern CPUs Are Starving and What Can Be Done about It , 2010, Computing in Science & Engineering.

[5]  Sebastian Bächle,et al.  Separating Key Concerns in Query Processing - Set orientation, Physical Data independence, and Parallelism , 2013 .

[6]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[7]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[8]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[9]  Tobias Maier,et al.  JSON - JavaScript Object Notation , 2012 .

[10]  Peter Van Roy,et al.  Measuring Elasticity for Cloud Databases , 2011, CLOUD 2011.

[11]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[12]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[13]  Leslie Lamport,et al.  The part-time parliament , 1998, TOCS.

[14]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[15]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[16]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[17]  Theo Härder DBMS Architecture - New Challenges Ahead , 2005, Datenbank-Spektrum.

[18]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[19]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[20]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[21]  Caetano Sauer,et al.  Unleashing XQuery for Data-Independent Programming , 2014, Datenbank-Spektrum.

[22]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[23]  M. Tamer Özsu,et al.  A comprehensive XQuery to SQL translation using dynamic interval encoding , 2003, SIGMOD '03.

[24]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[25]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[26]  Leslie Lamport,et al.  Consensus on transaction commit , 2004, TODS.

[27]  Andrey Balmin,et al.  Jaql , 2011, Proc. VLDB Endow..

[28]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[29]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[30]  Rares Vernica,et al.  Hyracks: A flexible and extensible foundation for data-intensive computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[31]  Friedemann Mattern,et al.  Virtual Time and Global States of Distributed Systems , 2002 .

[32]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[33]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[34]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[35]  Beim Fachbereich Informatik Separating Key Concerns in Query Processing Set Orientation, Physical Data Independence, and Parallelism , 2012 .

[36]  Elliotte Rusty Harold,et al.  XML in a Nutshell , 2001 .

[37]  Henrique Valer XQuery-based application development , 2011 .

[38]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[39]  Flavio Paiva Junqueira,et al.  Zab: High-performance broadcast for primary-backup systems , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).