MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database

This paper presents our experiences in porting the Sloan Digital Sky Survey(SDSS)/ SkyServer to the state-of- the-art open source database system MonetDB/SQL. SDSS acts as a well-documented benchmark for scientific database management. We have achieved a fully functional prototype for the personal SkyServer, to be downloaded from our site. The lessons learned are 1) the column store approach of MonetDB demonstrates a great potential in the world of scientific databases. However, the application also challenged the functionality of our implementation and revealed that a fully operational SQL environment is needed, e.g. including persistent stored modules; 2) the initial performance is competitive to the reference platform, MS SQL Server 2005, and 3) the analysis of SDSS query traces hints at several techniques to boost performance by utilizing repetitive behavior and zoom-in/zoom-out access patterns, that are currently not captured by the system.

[1]  Martin L. Kersten,et al.  MIL primitives for querying a fragmented world , 1999, The VLDB Journal.

[2]  Martin L. Kersten Database Architecture Fertilizers: Just-in-Time, Just-Enough, and Autonomous Growth , 2006, EDBT.

[3]  Anastasia Ailamaki,et al.  AutoPart: automating schema design for large scientific databases using data partitioning , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[4]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[5]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  Patrick Valduriez,et al.  Preventive Replication in a Database Cluster , 2005, Distributed and Parallel Databases.

[7]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[8]  Peter Z. Kunszt,et al.  Data Mining the SDSS SkyServer Database , 2002, WDAS.

[9]  Jennifer Widom,et al.  The Lowell database research self-assessment , 2003, CACM.

[10]  Martin L. Kersten,et al.  Cracking the Database Store , 2005, CIDR.

[11]  Martin L. Kersten,et al.  Distribution Rules for Array Database Queries , 2005, DEXA.

[12]  Marcin Zukowski,et al.  MonetDB/X100: Hyper-Pipelining Query Execution , 2005, CIDR.

[13]  Alexander S. Szalay,et al.  There Goes the Neighborhood: Relational Algebra for Spatial Data Search , 2004, ArXiv.

[14]  Francisco Castro-Company,et al.  MADIS: A Slim Middleware for Database Replication , 2005, Euro-Par.

[15]  Nitesh V. Chawla,et al.  A Black-Box Approach to Query Cardinality Estimation , 2007, CIDR.

[16]  Alexander S. Szalay,et al.  The Zones Algorithm for Finding Points-Near-a-Point or Cross-Matching Spatial Datasets , 2007, ArXiv.