UDBMS: Road to Unification for Multi-model Data Management

One of the greatest challenges in big data management is the “Variety” of the data. The data may be presented in various types and formats: structured, semi-structured and unstructured. For instance, data can be modeled as relational, key-value, and graph models. Having a single data platform for managing both well-structured data and NoSQL data is beneficial to users; this approach reduces significantly integration, migration, development, maintenance, and operational issues. Therefore, a challenging research work is how to develop an efficient consolidated single data management platform covering both NoSQL and relational data to reduce integration issues, simplify operations, and eliminate migration issues. In this paper, we envision novel principles and technologies to handle multiple models of data in one unified database system, including model-agnostic storage, unified query processing and indexes, in-memory structures and multi-model transactions. We discuss our visions as well as present research challenges that we need to address.

[1]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[2]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[3]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[4]  Yannis Papakonstantinou,et al.  The SQL++ Semi-structured Data Model and Query Language: A Capabilities Survey of SQL-on-Hadoop, NoSQL and NewSQL Databases , 2014, ArXiv.

[5]  Irena Holubová,et al.  Multi-model Data Management: What's New and What's Next? , 2017, EDBT.

[6]  Ioana Manolescu,et al.  Invisible Glue: Scalable Self-Tunning Multi-Stores , 2015, CIDR.

[7]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[8]  Shijie Zhang,et al.  TreePi: A Novel Graph Indexing Method , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Luc Quoniam,et al.  How to Use Big Data Technologies to Optimize Operations in Upstream Petroleum Industry , 2013, ArXiv.

[10]  Jiaheng Lu,et al.  Optimal algorithms for selecting top-k combinations of attributes: theory and applications , 2018, The VLDB Journal.

[11]  Beng Chin Ooi,et al.  XR-tree: indexing XML data for efficient structural joins , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[13]  Xiaoyong Du,et al.  Big data challenge: a data management perspective , 2013, Frontiers of Computer Science.

[14]  Weiyun Huang,et al.  Real-Time Analytical Processing with SQL Server , 2015, Proc. VLDB Endow..

[15]  Zhen Hua Liu,et al.  Accelerating Analytics with Dynamic In-Memory Expressions , 2016, Proc. VLDB Endow..

[16]  Jiaheng Lu,et al.  Top-k String Auto-Completion with Synonyms , 2017, DASFAA.

[17]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[18]  Michael J. Carey,et al.  Algebricks: a data model-agnostic compiler backend for big data languages , 2015, SoCC.

[19]  Michael Stonebraker,et al.  A Demonstration of the BigDAWG Polystore System , 2015, Proc. VLDB Endow..

[20]  Paolo Papotti,et al.  Road to Freedom in Big Data Analytics , 2016, EDBT.

[21]  Chen Wang,et al.  Extended XML Tree Pattern Matching: Theories and Algorithms , 2011, IEEE Transactions on Knowledge and Data Engineering.

[22]  Jeffrey D. Ullman,et al.  Storing and Querying Tree-Structured Records in Dremel , 2014, Proc. VLDB Endow..

[23]  Michael Stonebraker,et al.  "One Size Fits All": An Idea Whose Time Has Come and Gone (Abstract) , 2005, ICDE.

[24]  Yu Liu,et al.  Towards Maximum Independent Sets on Massive Graphs , 2015, Proc. VLDB Endow..

[25]  Yu Liu,et al.  ProbeSim: Scalable Single-Source and Top-k SimRank Computations on Dynamic Graphs , 2017, Proc. VLDB Endow..

[26]  Jiaheng Lu Towards Benchmarking Multi-Model Databases , 2017, CIDR.

[27]  David J. DeWitt,et al.  Split query processing in polybase , 2013, SIGMOD '13.

[28]  Steven Hand,et al.  Musketeer: all for one, one for all in data processing systems , 2015, EuroSys.

[29]  Dieter Gawlick,et al.  Management of Flexible Schema Data in RDBMSs - Opportunities and Limitations for NoSQL - , 2015, CIDR.

[30]  Tore Risch,et al.  Querying combined cloud-based and relational databases , 2011, 2011 International Conference on Cloud and Service Computing.

[31]  Anisoara Nica,et al.  Constructing Join Histograms from Histograms with q-error Guarantees , 2016, SIGMOD Conference.

[32]  Michael Stonebraker,et al.  VERTEXICA: Your Relational Friend for Graph Analytics! , 2014, Proc. VLDB Endow..

[33]  Ying Liu,et al.  Closing the functional and Performance Gap between SQL and NoSQL , 2016, SIGMOD Conference.

[34]  Shivnath Babu,et al.  How to Fit when No One Size Fits , 2013, CIDR.

[35]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.