Cellular DBMS: Customizable and autonomous data management using a RISC-style architecture

Database management systems (DBMS) were developed decades ago with consideration for the legacy hardware and data management requirements. Over years, developments in the hardware and the data management have forced DBMS to grow in functionalities. These functionalities got tightly integrated into the DBMS core because of their monolithic architecture. This has resulted in increased complexity of DBMS, which makes them difficult to tune for consistent performance. Furthermore, the decreasing cost of the hardware and the software has resulted in making the human resource a major factor in the total cost of ownership for the data management. There exists a need to revisit existing database architecture using unconventional and unexplored techniques towards more diversified and loosely coupled architectures. We present the Cellular DBMS architecture, which is designed according to the RISC-style self-tuning database architecture proposed by Chaudhuri and Weikum in their VLDB 2000 paper. The Cellular DBMS architecture proposes to construct a large DBMS by using multiple RISC-style cells in concert, where each cell is atomic, customized, and autonomous instance of an embedded database. Using the Cellular DBMS architecture, we designed and implemented a customizable and self-tuning storage manager; we termed as Evolutionary Column-oriented Storage (ECOS). ECOS supports the storage model customization at table-level using different variations of the decomposed storage model. It supports the storage structure customization at the column-level using evolving hierarchically-organized storage structures. These storage structures automatically evolve themselves with the growth of data considering the workload. Their evolution behavior is defined using evolution paths. The Cellular DBMS architecture uses innovative software engineering approaches, such as the software product line, the feature-oriented programming, and the aspect-oriented programming to realize customization and autonomy. We implemented the Cellular DBMS prototype constituting the ECOS storage manager in C++ using FeatureC++ and AspectC++ tools. We evaluated our prototype implementation using a custom micro benchmark to show the benefits of our proposed architecture. Dedications To my lovely parents, wife, siblings, and friends. V VI Acknowledgments I am thankful to Prof. Dr. Gunter Saake, who gave me the opportunity to work under his supervision within his workgroup. All that I have learned and achieved during my PhD became possible with his support. He has been very cooperative during ups and downs of my PhD academics. It was his support that enabled me to work on my PhD topic, which was and is too ambitious as a PhD project for a single person with limited …

[1]  William G. Griswold,et al.  An Overview of AspectJ , 2001, ECOOP.

[2]  Christoforos E. Kozyrakis,et al.  A case for intelligent RAM , 1997, IEEE Micro.

[3]  Surajit Chaudhuri,et al.  Physical Design Refiner The 'Merge-Reduce' Approacii , 2007 .

[4]  Carlo Eynard,et al.  A Centralised Cellular Database to Support Network Management Process , 1998, ER Workshops.

[5]  Martin L. Kersten,et al.  Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct , 2009, Proc. VLDB Endow..

[6]  Per Svensson,et al.  An Overview of Cantor - A New System for Data Analysis , 1983, SSDBM.

[7]  Sven Apel,et al.  Granularity in software product lines , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[8]  Beng Chin Ooi,et al.  The Claremont report on database research , 2008, SGMD.

[9]  Thomas Leich,et al.  FeatureC++: on the symbiosis of feature-oriented and aspect-oriented programming , 2005, GPCE'05.

[10]  David J. DeWitt,et al.  DBMSs on a Modern Processor: Where Does Time Go? , 1999, VLDB.

[11]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[12]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[13]  Gunter Saake,et al.  Type checking annotation-based product lines , 2012, TSEM.

[14]  Timothy Mattson,et al.  A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[15]  Jim Gray,et al.  The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time , 1987, SIGMOD '87.

[16]  Martin L. Kersten,et al.  Breaking the memory wall in MonetDB , 2008, CACM.

[17]  Wouter Joosen,et al.  Towards an aspect-oriented architecture for self-adaptive frameworks , 2008 .

[18]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[19]  Gunter Saake,et al.  On the impact of the optional feature problem: analysis and case studies , 2009, SPLC.

[20]  Anastasia Ailamaki,et al.  A Case for Staged Database Systems , 2003, CIDR.

[21]  Martin L. Kersten The Database Architecture Jigsaw Puzzle , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  하수철,et al.  [서평]「Component Software」 - Beyond Object-Oriented Programming - , 2000 .

[23]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[24]  Bingsheng He,et al.  Cache-oblivious databases: Limitations and opportunities , 2008, TODS.

[25]  Anastasia Ailamaki,et al.  Efficient use of the query optimizer for automated physical design , 2007, VLDB 2007.

[26]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[27]  Vivek R. Narasayya,et al.  Automatic physical design tuning: workload as a sequence , 2006, SIGMOD Conference.

[28]  Gunter Saake,et al.  Cellular DBMS: An Attempt Towards Biologically-Inspired Data Management , 2010, J. Digit. Inf. Manag..

[29]  Gerhard Weikum,et al.  Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System , 2000, VLDB.

[30]  Lynne Blair,et al.  Using Dynamic Aspect-Oriented Programming to Implement an Autonomic System , 2004 .

[31]  Michael Stonebraker,et al.  "One Size Fits All": An Idea Whose Time Has Come and Gone (Abstract) , 2005, ICDE.

[32]  Serge Abiteboul,et al.  COLT: continuous on-line tuning , 2006, SIGMOD Conference.

[33]  Klaus R. Dittrich,et al.  The active database management system manifesto: a rulebase of ADBMS features , 1995, SGMD.

[34]  Martin L. Kersten A Cellular Database System for the 21st Century , 1997, ARTDB.

[35]  Anastassia Ailamaki,et al.  Staged database systems , 2005 .

[36]  Gary Valentin,et al.  Fractal prefetching B+-Trees: optimizing both cache and disk performance , 2002, SIGMOD '02.

[37]  Hasso Plattner,et al.  A common database approach for OLTP and OLAP using an in-memory column database , 2009, SIGMOD Conference.

[38]  Kyo Chul Kang,et al.  Feature-Oriented Domain Analysis (FODA) Feasibility Study , 1990 .

[39]  Patrick Valduriez,et al.  Principles of distributed database systems (2nd ed.) , 1999 .

[40]  Michael A. Bender,et al.  Cache-oblivious B-trees , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[41]  Arno Siebes,et al.  Bio-Inspired Data Management , 2006 .

[42]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[43]  Sven Apel,et al.  An analysis of the variability in forty preprocessor-based software product lines , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[44]  Klaus Meyer-Wegener,et al.  The Adaptation Model of a Runtime Adaptable DBMS , 2009, BNCOD.

[45]  Syed Saif ur Rahman Using evolving storage structures for data storage , 2010, FIT.

[46]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[47]  Xiaodan Wang,et al.  Automated physical design in database caches , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[48]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[49]  Gunter Saake,et al.  Multi-dimensional variability modeling , 2011, VaMoS.

[50]  Goetz Graefe,et al.  The five-minute rule ten years later, and other computer storage rules of thumb , 1997, SGMD.

[51]  Corporate Act-Net Consortium,et al.  The active database management system manifesto: a rulebase of ADBMS features , 1996, SGMD.

[52]  Gunter Saake,et al.  Specialized Embedded DBMS: Cell Based Approach , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[53]  Margo I. Seltzer,et al.  Berkeley DB , 1999, USENIX Annual Technical Conference, FREENIX Track.

[54]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[55]  T. Härder Dbms Architecture – the Layer Model and Its Evolution 1 Motivation Dbms Architecture – the Layer Model and Its Evolution , 2005 .

[56]  Don Batory,et al.  Scaling step-wise refinement , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[57]  Patrick Valduriez,et al.  Implementation Techniques of Complex Objects , 1986, VLDB.

[58]  Umeshwar Dayal,et al.  The architecture of an active database management system , 1989, SIGMOD '89.

[59]  Nicolas Bruno,et al.  Configuration-parametric query optimization for physical design tuning , 2008, SIGMOD Conference.

[60]  Benoît Dageville,et al.  Automatic SQL Tuning in Oracle 10g , 2004, VLDB.

[61]  David J. DeWitt,et al.  Data page layouts for relational databases on deep memory hierarchies , 2002, The VLDB Journal.

[62]  Michael Stonebraker,et al.  OLTP through the looking glass, and what we found there , 2008, SIGMOD Conference.

[63]  Marcin Zukowski,et al.  MonetDB/X100 - A DBMS In The CPU Cache , 2005, IEEE Data Eng. Bull..

[64]  Michael E. Senko,et al.  Data Structures and Accessing in Data-Base Systems. III: Data Representations and the Data Independent Accessing Model , 1973, IBM Syst. J..

[65]  Gunter Saake,et al.  ECOS: Evolutionary Column-Oriented Storage , 2011, BNCOD.

[66]  Kai-Uwe Sattler,et al.  An Integrated Approach to Performance Monitoring for Autonomous Tuning , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[67]  Thomas Leich,et al.  Using Step-Wise Refinement to Build a Flexible Lightweight Storage Manager , 2005, ADBIS.

[68]  Gunter Saake,et al.  Cellular DBMS — Architecture for biologically-inspired customizable autonomous DBMS , 2009, 2009 First International Conference on Networked Digital Technologies.

[69]  Marcin Zukowski,et al.  DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing , 2008, DaMoN '08.

[70]  Jörgen Hansson,et al.  Towards Aspectual Component-Based Development of Real-Time Systems , 2003, RTCSA.

[71]  Yong Yao,et al.  The cougar approach to in-network query processing in sensor networks , 2002, SGMD.

[72]  David J. DeWitt,et al.  Read-optimized databases, in depth , 2008, Proc. VLDB Endow..

[73]  Philippe Bonnet,et al.  Database tuning principles, experiments, and troubleshooting techniques , 2004, SGMD.

[74]  Wolfgang Lehner,et al.  Towards Integrated Data Analytics: Time Series Forecasting in DBMS , 2012, Datenbank-Spektrum.

[75]  Michael J. Carey,et al.  A Study of Index Structures for a Main Memory Database Management System , 1986, HPTS.

[76]  Piotr Synak,et al.  Brighthouse: an analytic data warehouse for ad-hoc queries , 2008, Proc. VLDB Endow..

[77]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[78]  Marko Rosenmüller,et al.  Automating the Configuration of Multi Software Product Lines , 2010, VaMoS.

[79]  Klaus Pohl,et al.  Avoiding Redundant Testing in Application Engineering , 2010, SPLC.

[80]  WilsonBrent Introduction to parallel programming using message-passing , 2005 .

[81]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[82]  Sven Apel,et al.  Code clones in feature-oriented software product lines , 2010, GPCE '10.

[83]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[84]  Per Svensson The Evolution of Vertical Database Architectures - A Historical Review (Keynote Talk) , 2008, SSDBM.

[85]  Gunter Saake,et al.  Flexible feature binding in software product lines , 2011, Automated Software Engineering.

[86]  F. Tödtling,et al.  One size fits all?: Towards a differentiated regional innovation policy approach , 2005 .

[87]  Marko Rosenmüller,et al.  Improving reuse of component families by generating component hierarchies , 2010, FOSD '10.

[88]  Surajit Chaudhuri,et al.  An Online Approach to Physical Design Tuning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[89]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[90]  Per Svensson,et al.  The Design of Cantor - A New System for Data Analysis , 1986, SSDBM.

[91]  Pamela Zave,et al.  An experiment in feature engineering , 2003 .

[92]  Alexander Zeier,et al.  Speeding Up Queries in Column Stores - A Case for Compression , 2010, DaWak.

[93]  Thomas Leich,et al.  Tailor-made data management for embedded systems: A case study on Berkeley DB , 2009, Data Knowl. Eng..

[94]  Juha Taina,et al.  Product family testing: a survey , 2004, SOEN.

[95]  Joseph L. Hellerstein Automated Tuning Systems: Beyond Decision Support , 1997, Int. CMG Conference.

[96]  Gerhard Weikum,et al.  Self-tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering , 2002, VLDB.

[97]  Martin L. Kersten,et al.  Efficient image retrieval by exploiting vertical fragmentation , 2001 .

[98]  Cristina V. Lopes,et al.  Aspect-oriented programming , 1999, ECOOP Workshops.

[99]  Michael A. Olson,et al.  Selecting and Implementing an Embedded Database System , 2000, Computer.

[100]  David A. Patterson,et al.  The case for the reduced instruction set computer , 1980, CARN.

[101]  Thomas Leich,et al.  Downsizing Data Management for Embedded Systems , 2009, Egypt. Comput. Sci. J..

[102]  Martin L. Kersten,et al.  Cracking the Database Store , 2005, CIDR.

[103]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[104]  Marcin Zukowski,et al.  Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS , 2007, VLDB.

[105]  Michael Stonebraker,et al.  The design and implementation of INGRES , 1976, TODS.

[106]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[107]  Thomas Leich,et al.  SQL á la Carte - Toward Tailor-made Data Management , 2009, BTW.

[108]  Jia Liu,et al.  Feature oriented refactoring of legacy applications , 2006, ICSE.

[109]  Daniel J. Abadi,et al.  Column-stores vs. row-stores: how different are they really? , 2008, SIGMOD Conference.

[110]  Goetz Graefe,et al.  The Five-Minute Rule 20 Years Later: and How Flash Memory Changes the Rules , 2008, ACM Queue.

[111]  Gerhard Weikum,et al.  A Database Striptease or How to Manage Your Personal Databases , 2003, VLDB.

[112]  Kai-Uwe Sattler,et al.  QUIET: Continuous Query-driven Index Tuning , 2003, VLDB.

[113]  Sam Lightstone,et al.  Toward autonomic computing with DB2 universal database , 2002, SGMD.

[114]  Don S. Batory,et al.  On searching transposed files , 1978, ACM Trans. Database Syst..

[115]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[116]  Wei Hong,et al.  TinyDB: an acquisitional query processing system for sensor networks , 2005, TODS.

[117]  Klaus Pohl,et al.  Software Product Line Engineering - Foundations, Principles, and Techniques , 2005 .

[118]  E. Angert,et al.  Alternatives to binary fission in bacteria , 2005, Nature Reviews Microbiology.