Database Cracking: Towards Auto-tuning Database Kernels

textabstractIndices are heavily used in database systems in order to achieve the ultimate query processing performance. It takes a lot of time to create an index and the system needs to reserve extra storage space to store the auxiliary data structure. When updates arrive, there is also the overhead of maintaining the index. This way, \emph{which} indices to create and \emph{when} to create them has been and still is one of the most important research topics over the last decades. If the workload is known up-front or it can be predicted and if there is enough idle time to spare, then we can a priori create all necessary indices and exploit them when queries arrive. But what happens if we do not have this knowledge or idle time? Similarly, what happens if the workload changes often, suddenly and in an unpredictable way? Even if we can correctly analyze the current workload, it may well be that by the time we finish our analysis and create all necessary indices, the workload pattern has changed. Here we argue that a database system should just be given the data and queries in a declarative way and the system should internally take care of finding not only the proper algorithms and query plans but also the proper physical design to match the workload and application needs. The goal is to remove the role of database administrators, leading to systems that can completely automatically self-tune and adapt even to dynamic environments. Database Cracking implements the first adaptive kernel that automatically adapts to the access patterns by selectively and adaptively optimizing the data set purely for the workload at hand. It continuously reorganizes input data on-the-fly as a side-effect of query processing using queries as an advice of how data should be stored. Everything happens within operator calls during query processing and brings knowledge to the system that future operators in future queries can exploit. Essentially, the necessary indices are built incrementally as the system gains more and more knowledge about the workload needs.

[1]  Rinke Hoekstra,et al.  Ontology Representation - Design Patterns and Ontologies that Make Sense , 2009, Frontiers in Artificial Intelligence and Applications.

[2]  Manolis Koubarakis,et al.  P2P-DIET: an extensible P2P service that unifies ad-hoc and continuous querying in super-peer networks , 2004, SIGMOD '04.

[3]  P. Groot,et al.  A Theoretical and Empirical Analysis of Approximation in Symbolic Problem Solving , 2004 .

[4]  G. Jonker Efficient and Equitable Exchange in Air Traffic Management Plan Repair using Spender-signed Currency , 2008 .

[5]  B Praveen Kumar,et al.  Mariposa a Wide-Area Distributed Database System , 2010, ICCA 2010.

[6]  StashNatalia Incorporating cognitive/learning styles in a general-purpose adaptive hypermedia system , 2007 .

[7]  Marco Kalz,et al.  Placement Support for Learners in Learning Networks , 2006 .

[8]  Daniel J. Abadi,et al.  Integrating compression and execution in column-oriented database systems , 2006, SIGMOD Conference.

[9]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[10]  Wojciech Janusz Jamroga,et al.  Using multiple models of reality: on agents who know how to play safer , 2004 .

[11]  R. G. Santana Tapia,et al.  Assessing business-IT alignment in networked organizations , 2009 .

[12]  M.A.J. van Gerven,et al.  Bayesian networks for clinical decision support: A rational approach to dynamic decision-making under uncertainty , 2007 .

[13]  van Joeri Ruth Flattening queries over nested data types , 2006 .

[14]  Kenneth A. Ross,et al.  A multi-resolution block storage model for database design , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[15]  Andreas Martin Thomas Lincke,et al.  Electronic business negotiation: some experimental studies on the interaction between medium, innovation context, and culture , 2003 .

[16]  Stacey Fusae Nagat User Assistance for Multitasking with Interruptions on a Mobile Device , 2006 .

[17]  Wolfgang Nejdl,et al.  Publish/Subscribe for RDF-based P2P Networks , 2004, ESWS.

[18]  Christian Glahn,et al.  Contextual support of social engagement and reflection on the Web , 2009 .

[19]  Rainer Malik CONAN : Text Mining in the Biomedical Domain , 2006 .

[20]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[21]  Daniel J. Abadi,et al.  Query execution in column-oriented database systems , 2008 .

[22]  Hugo Hendrik Kielman Politiële gegevensverwerking en Privacy. Naar een effectieve waarborging , 2010 .

[23]  Olga Anatoliyivna Kulyk,et al.  Do You Know What I Know? Situational Awareness of Co-located Teams in Multidisplay Environments. , 2010 .

[24]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[25]  Hendrik Drachsler,et al.  Navigation Support for Learners in Informal Learning Networks , 2009 .

[26]  E. G. Boltjes,et al.  Voorbeeldig onderwijs : voorbeeldgestuurd onderwijs, een opstap naar abstract denken, vooral voor meisjes , 2004 .

[27]  Wolfgang Nejdl,et al.  Designing Semantic Publish/Subscribe Networks Using Super-Peers , 2006, Semantic Web and Peer-to-Peer.

[28]  Fernando Luiz Koch,et al.  An Agent-Based Model for the Development of Intelligent Mobile Services , 2009 .

[29]  Trung Huu Bui,et al.  Toward affective dialogue management using partially observable Markov decision processes , 2008 .

[30]  Manolis Koubarakis,et al.  Publish/Subscribe with RDF Data over Large Structured Overlay Networks , 2005, DBISP2P.

[31]  P.A.T. van Eck,et al.  A Compositional Semantic Structure for Multi-Agent Systems Dynamics , 2001 .

[32]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.

[33]  Daniel C. Zilio,et al.  DB2 advisor: an optimizer smart enough to recommend its own indexes , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[34]  A. de Moor,et al.  Empowering communities: A method for the legitimate user-driven specification of network information systems , 1999 .

[35]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[36]  Karianne Vermaas,et al.  Fast diffusion and broadening use: A research on residential adoption and usage of broadband internet in the Netherlands between 2001 and 2005 , 2007 .

[37]  Jarek Gryz,et al.  Semantic Query Caching for Hetereogeneous Databases , 1997, KRDB.

[38]  Henk Ernst Blok Database Optimization Aspects for Information Retrieval , 2002 .

[39]  Marina Velikova,et al.  Monotone models for prediction in data mining , 2006 .

[40]  Neerincx,et al.  Human-computer interaction and presence in virtual reality exposure therapy , 2003 .

[41]  Luping Ding,et al.  Dynamic Materialized Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[42]  Manolis Koubarakis,et al.  Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks , 2006, SEMWEB.

[43]  S. J. Karlsson Scalable distributed data structures for database management , 2000 .

[44]  Serge Abiteboul,et al.  COLT: Continuous On-Line Database Tuning , 2006 .

[45]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[46]  Wouter Immánuël Koelewijn Privacy en politiegegevens. Over geautomatiseerde normatieve informatie-uitwisseling , 2009 .

[47]  Harumi A. Kuno,et al.  Self-selecting, self-tuning, incrementally optimized indexes , 2010, EDBT '10.

[48]  Stefan Manegold,et al.  Cache-Conscious Radix-Decluster Projections , 2004, VLDB.

[49]  A. J. Lehmann Causation in artificial intelligence and law : a modelling approach , 2003 .

[50]  Goetz Graefe Fast Loads and Fast Queries , 2009, DaWaK.

[51]  M. B. van Riemsdijk,et al.  Cognitive agent programming : A semantic approach , 2006 .

[52]  Martin Wigbertus Antonius Caminada For the sake of the Argument : explorations into argument-based reasoning , 1997 .

[53]  Anastasia Ailamaki,et al.  Clotho: Decoupling memory page layout from storage organization , 2004, VLDB.

[54]  Joost Geurts,et al.  A document engineering model and processing framework for multimedia documents , 2010 .

[55]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[56]  David J. DeWitt,et al.  DBMSs on modern processors: Where does time go? , 1999 .

[57]  Ans A. G. Steuten A contribution to the linguistic analysis of business conversations within the language/action perspective , 1998 .

[58]  Anastasia Ailamaki,et al.  Efficient Use of the Query Optimizer for Automated Database Design , 2007, VLDB.

[59]  D. Tykhonov,et al.  Designing Generic and Efficient Negotiation Strategies , 2010 .

[60]  M. Żukowski,et al.  Balancing vectorized query execution with bandwidth-optimized storage , 2009 .

[61]  Simon Keizer,et al.  Reasoning under Uncertainty in Natural Language Dialogue using Bayesian Networks , 2003 .

[62]  Lai Xu Monitoring multi-party contracts for E-business , 2004 .

[63]  Martin L. Kersten,et al.  Database Architecture Optimized for the New Bottleneck: Memory Access , 1999, VLDB.

[64]  Dennis Reidsma,et al.  Annotations and subjective machines of annotators, embodied agents, users, and other humans , 2008 .

[65]  C. Gerritsen Caught in the Act: Investigating Crime by Agent-Based Simulation , 2010 .

[66]  H. Stuckenschmidt,et al.  Ontology-Based Information Sharing in Weakly Structured Environments , 2003 .

[67]  Boris Shishkov,et al.  Software Specification Based on Re-usable Business Components , 2005 .

[68]  Michiel Hildebrand End-user support for access to heterogeneous linked data , 2010 .

[69]  Dan Suciu,et al.  What Can Peer-to-Peer Do for Databases, and Vice Versa? , 2001 .

[70]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[71]  Daniel J. Abadi,et al.  Performance tradeoffs in read-optimized databases , 2006, VLDB.

[72]  Claudia Hauff,et al.  Predicting the effectiveness of queries and retrieval systems , 2010, SIGF.

[73]  Niels Nes,et al.  Image database management systems design considerations algorithms and architecture , 2000 .

[74]  Thijs Westerveld,et al.  Using generative probabilistic models for multimedia retrieval , 2005, SIGF.

[75]  Praveen Seshadri,et al.  Generalized partial indexes , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[76]  Jacob Lenting Informed gambling : conception and analysis of a multi-agent mechanism for discrete reallocation , 1999 .

[77]  David J. DeWitt,et al.  Materialization Strategies in a Column-Oriented DBMS , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[78]  V. Bessa Machado Supporting the Construction of Qualitative Knowledge models , 2004 .

[79]  Clement T. Yu,et al.  Distributed query processing , 1984, CSUR.

[80]  H. Koning Communication of IT-Architecture , 2008 .

[81]  L.J.P. van der Maaten Feature extraction from visual data , 2009 .

[82]  Fausto Giunchiglia,et al.  Data Management for Peer-to-Peer Computing : A Vision , 2002, WebDB.

[83]  Adriaan ter Mors,et al.  The world according to MARP , 2010 .

[84]  Manolis Koubarakis,et al.  Continuous multi-way joins over distributed hash tables , 2008, EDBT '08.

[85]  Gang Luo,et al.  Partial Materialized Views , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[86]  Manolis Koubarakis,et al.  Semantic Grid Resource Discovery using DHTs in Atlas , 2006 .

[87]  Juan Roberto Castelo Valdueza,et al.  The Discrete Acyclic Digraph Markov Model in Data Mining , 2002 .

[88]  Manolis Koubarakis,et al.  Distributed Evaluation of Continuous Equi-join Queries over Large Structured Overlay Networks , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[89]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[90]  Rik Eshuis,et al.  Semantics and Verification of UML Activity Diagrams for Workflow Modelling , 2002 .

[91]  Manolis Koubarakis,et al.  Query Processing in Super-Peer Networks with Languages Based on Information Retrieval: The P2P-DIET Approach , 2004, EDBT Workshops.

[92]  W. H. van Atteveldt,et al.  Semantic Network Analysis: Techniques for Extracting, Representing, and Querying Media Content , 2008 .

[93]  Jurriaan van Diggelen,et al.  Achieving semantic interoperability in multi-agent systems: A dialogue-based approach , 2007 .

[94]  G. Folino High performance data mining using bio-inspired techniques , 2010 .

[95]  Zulfiqar Ali Memon Designing human-awareness for ambient agents: A human mindreading perspective , 2010, J. Ambient Intell. Smart Environ..

[96]  Michael Stonebraker,et al.  The case for partial indexes , 1989, SGMD.

[97]  Bart Willem Schermer,et al.  Software Agents, Surveillance and the right to privacy , 2007 .

[98]  I. van de Weerd,et al.  Advancing in software product management: An incremental method engineering approach , 2009 .

[99]  Laura Hollink,et al.  Semantic annotation for retrieval of visual resources , 2006 .

[100]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[101]  Virginia N. L. Franqueira,et al.  Finding multi-step attacks in computer networks using heuristic search and mobile ambients , 2009 .

[102]  Anil K. Goel,et al.  Towards Adaptive Costing of Database Access Methods , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[103]  Martin Kersten,et al.  Exploiting the power of relational databases for efficient stream processing , 2009, EDBT '09.

[104]  Ivo Swartjes Whose story is it anyway? How improv informs agency and authorship of emergent narrative , 2010 .

[105]  C. van Nimwegen,et al.  The paradox of the guided user: assistance can be counter-effective , 2008 .

[106]  Scott Shenker,et al.  Querying the Internet with PIER , 2003, VLDB.

[107]  Henning Rode,et al.  From Document to Entity Retrieval: Improving Precision and Performance of Focused Text Search , 2008 .

[108]  Jignesh M. Patel,et al.  Data Morphing: An Adaptive, Cache-Conscious Storage Technique , 2003, VLDB.

[109]  Peter Mika,et al.  Social Networks and the Semantic Web , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[110]  Jaap Gordijn,et al.  Value-based requirements engineering: exploring innovative e-commerce ideas , 2003, Requirements Engineering.

[111]  W.C.A. Wijngaards,et al.  Agent-Based Modelling of Dynamics: Biological and Organisational Applications , 2002 .

[112]  Peter Boncz,et al.  UvA-DARE ( Digital Academic Repository ) Monet ; a next-Generation DBMS Kernel For Query-Intensive Applications , 2007 .

[113]  Sriram Padmanabhan Data placement in shared-nothing parallel database systems , 1992 .

[114]  Pavel Serdyukov,et al.  Search for expertise : going beyond direct evidence , 2009 .

[115]  Ying Zhang,et al.  XRPC: efficient distributed query processing on heterogeneous XQuery engines , 2010 .

[116]  Surajit Chaudhuri,et al.  To tune or not to tune?: a lightweight physical design alerter , 2006, VLDB.

[117]  Martijn van Otterlo,et al.  The logic of adaptive behavior : knowledge representation and algorithms for the Markov decision process framework in first-order domains , 2008 .

[118]  Peter Van Rosmalen,et al.  Supporting the tutor in the design and support of adaptive e-learning , 2008 .

[119]  Manolis Koubarakis,et al.  Continuous RDF Query Processing over DHTs , 2007, ISWC/ASWC.

[120]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[121]  Jarek Gryz,et al.  Answering Queries by Semantic Caches , 1999, DEXA.

[122]  Jan Broersen Modal Action Logics for Reasoning about Reactive Systems , 2003 .

[123]  W. Teepe Reconciling Information Exchange and Confidentiality, A Formal Approach , 2007 .

[124]  S. A. Raaijmakers,et al.  Multinomial Language Learning: Investigations into the Geometry of Language , 2009 .

[125]  Sander Evers,et al.  Sensor data management with probabilistic models , 2009 .

[126]  Bob van der Vecht,et al.  Adjustable Autonomy: Controling Influences on Decision Making , 2009 .

[127]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[128]  Roelof van Zwol Modelling and searching web-based document collections , 2002 .

[129]  Jeffrey F. Naughton,et al.  Cache Conscious Algorithms for Relational Query Processing , 1994, VLDB.

[130]  Sam Lightstone,et al.  DB2 Design Advisor: Integrated Automatic Physical Database Design , 2004, VLDB.

[131]  Surajit Chaudhuri,et al.  Database Tuning Advisor for Microsoft SQL Server 2005 , 2004, VLDB.

[132]  Bela Mutschler,et al.  Modeling and simulating causal dependencies on process-aware information systems from a cost perspective , 2008 .

[133]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[134]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[135]  S. Bocconi,et al.  Vox Populi : generating video documentaries from semantically annotated media repositories , 2006 .

[136]  Witold Litwin,et al.  LH*—a scalable, distributed data structure , 1996, TODS.

[137]  Manolis Koubarakis,et al.  Continuous MultiWay Joins over Distributed Hash Tables , 2007 .

[138]  Martin L. Kersten,et al.  Updating a cracked database , 2007, SIGMOD '07.

[139]  Vladik Kreinovich,et al.  Best student paper award , 1996, Reliab. Comput..

[140]  Surajit Chaudhuri,et al.  An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server , 1997, VLDB.

[141]  Stefan Visscher,et al.  Bayesian network models for the management of ventilator-associated pneumonia , 2008 .

[142]  Martin L. Kersten,et al.  Optimizing Main-Memory Join on Modern Hardware , 2002, IEEE Trans. Knowl. Data Eng..

[143]  Marko Smiljanic,et al.  XML schema matching : balancing efficiency and effectiveness by means of clustering , 2006 .

[144]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[145]  David J. DeWitt,et al.  Weaving Relations for Cache Performance , 2001, VLDB.

[146]  Manolis Koubarakis,et al.  P2P-DIET: Ad-hoc and Continuous Queries in Peer-to-Peer Networks Using Mobile Agents , 2004, SETN.

[147]  Stefan Manegold,et al.  Understanding, modeling, and improving main-memory database performance , 2002 .

[148]  F. P. Terpstra,et al.  Scientific workflow design : theoretical and practical issues , 2008 .

[149]  Johan van den Akker,et al.  DEGAS: an active, temporal database of autonomous objects , 1998 .

[150]  Ronald Poppe,et al.  Discriminative vision-based recovery and recognition of human motion , 2009 .

[151]  Koen V. Hindriks,et al.  Agent programming languages: programming with mental models , 2001 .

[152]  Manolis Koubarakis,et al.  Selective information dissemination in P2P networks: problems and solutions , 2003, SGMD.

[153]  Harumi A. Kuno,et al.  Adaptive indexing for relational keys , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[154]  C.M.T. Metselaar,et al.  Sociaal-organisatorische gevolgen van kennistechnologie : een procesbenadering en actorperspectief , 2000 .

[155]  Manolis Koubarakis,et al.  Publish/subscribe functionality in IR environments using structured overlay networks , 2005, SIGIR '05.

[156]  L. H. Christoph The role of metacognitive skills in learning to solve problems , 2006 .

[157]  Martin L. Kersten,et al.  Self-organizing tuple reconstruction in column-stores , 2009, SIGMOD Conference.

[158]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[159]  van Boris Wessel Schooten,et al.  Development and Specification of Virtual Environments , 2003 .

[160]  A. J. Hommersom,et al.  On the Application of Formal Methods to Clinical Guidelines, an Artificial Intelligence Perspective , 2008 .

[161]  Davide Grossi,et al.  Designing invisible handcuffs : Formal investigations in institutions and organizations for multi-agent systems , 2007 .

[162]  Raghav Kaushik,et al.  Estimating the compression fraction of an index using sampling , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).