NoSE: Schema Design for NoSQL Applications

Database design is critical for high performance in relational databases and a myriad of tools exist to aid application designers in selecting an appropriate schema. While the problem of schema optimization is also highly relevant for NoSQL databases, existing tools for relational databases are inadequate in that setting. Application designers wishing to use a NoSQL database instead rely on rules of thumb to select an appropriate schema. We present a system for recommending database schemas for NoSQL applications. Our cost-based approach uses a novel binary integer programming formulation to guide the mapping from the application's conceptual data model to a database schema. We implemented a prototype of this approach for the Cassandra extensible record store. Our prototype, the NoSQL Schema Evaluator (NoSE) is able to capture rules of thumb used by expert designers without explicitly encoding the rules. Automating the design process allows NoSE to produce efficient schemas and to examine more alternatives than would be possible with a manual rule-based approach.

[1]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[2]  Stanley B. Zdonik,et al.  CORADD: Correlation Aware Database Designer for Materialized Views and Indexes , 2010, Proc. VLDB Endow..

[3]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[4]  Surajit Chaudhuri,et al.  Automatic physical database tuning: a relaxation-based approach , 2005, SIGMOD '05.

[5]  Chun Zhang,et al.  Automating physical database design in a parallel database , 2002, SIGMOD '02.

[6]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[7]  Michael Lawley,et al.  A Query Language for EER Schemas , 1994, Australasian Database Conference.

[8]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[9]  Chongxin Li,et al.  Transforming relational database into HBase: A case study , 2010, 2010 IEEE International Conference on Software Engineering and Service Sciences.

[10]  Stanley B. Zdonik,et al.  An automatic physical design tool for clustered column-stores , 2013, EDBT '13.

[11]  Gergely Mezei,et al.  Automatic NoSQL Schema Development: A Case Study , 2013 .

[12]  Rui Liu,et al.  NoSE: Schema Design for NoSQL Applications , 2017, IEEE Trans. Knowl. Data Eng..

[13]  Jeff Carpenter,et al.  Cassandra: The Definitive Guide , 2010 .

[14]  Marvin H. Solomon,et al.  The GMAP: a versatile tool for physical data independence , 1996, The VLDB Journal.

[15]  Peter P. Chen The entity-relationship model: toward a unified view of data , 1975, VLDB '75.

[16]  Ramakrishna Varadarajan,et al.  DBDesigner: A customizable physical design tool for Vertica Analytic Database , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[17]  Elisa Bertino,et al.  Indexing Techniques for Queries on Nested Objects , 1989, IEEE Trans. Knowl. Data Eng..

[18]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[19]  Anastasia Ailamaki,et al.  AutoPart: automating schema design for large scientific databases using data partitioning , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[20]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[21]  Benoît Dageville,et al.  Automatic SQL Tuning in Oracle 10g , 2004, VLDB.

[22]  Anastasia Ailamaki,et al.  CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads , 2011, Proc. VLDB Endow..

[23]  Olivier Teste,et al.  Implantation Not Only SQL des bases de données multidimensionnelles , 2015 .

[24]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[25]  Abhinandan Das,et al.  Automating layout of relational databases , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[26]  Surajit Chaudhuri,et al.  Automated Selection of Materialized Views and Indexes in SQL Databases , 2000, VLDB.

[27]  Anastasia Ailamaki,et al.  An Integer Linear Programming Approach to Database Design , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[28]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[29]  Sam Lightstone,et al.  Physical Database Design for Relational Databases , 2009, Encyclopedia of Database Systems.

[30]  Kenneth Salem,et al.  Workload-aware storage layout for database systems , 2010, SIGMOD Conference.

[31]  Willy Zwaenepoel,et al.  Performance and scalability of EJB applications , 2002, OOPSLA '02.