Requirements for Science Data Bases and SciDB

For the past year, we have been assembling requirements from a collection of scientific data base users from astronomy, particle physics, fusion, remote sensing, oceanography, and biology. The intent has been to specify a common set of requirements for a new science data base system, which we call SciDB. In addition, we have discovered that very complex business analytics share most of the same requirements as “big science”. We have also constructed a partnership of companies to fund the development of SciDB, including eBay, the Large Synoptic Survey Telescope (LSST), Microsoft, the Stanford Linear Accelerator Center (SLAC) and Vertica. Lastly, we have identified two “lighthouse customers” (LSST and eBay) who will run the initial system, once it is constructed. In this paper, we report on the requirements we have identified and briefly sketch some of the SciDB design.

[1]  Martin L. Kersten,et al.  MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[2]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[3]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[4]  Michael Bächle,et al.  Ruby on Rails , 2006, Softwaretechnik-Trends.

[5]  Bill Howe Gridfields: model-driven data transformation in the physical sciences , 2007 .

[6]  Charu C. Aggarwal,et al.  Trio A System for Data Uncertainty and Lineage , 2009 .

[7]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[8]  Michael Stonebraker,et al.  Sequoia 2000: a next-generation information system for the study of global change , 1994, Proceedings Thirteenth IEEE Symposium on Mass Storage Systems. Toward Distributed Storage and Data Management Systems.

[9]  Amr Elssamadisy Review of "Hibernate: A J2EE Developer's Guide by Will Iverson", Pearson Education Inc., 2005, ISBN: 0-471-20282-7 , 2006, SOEN.

[10]  GrayJim,et al.  Designing and mining multi-terabyte astronomy archives , 2000 .

[11]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[12]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[13]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[14]  David J. DeWitt,et al.  The BUCKY object-relational benchmark , 1997, SIGMOD '97.

[15]  Michael Stonebraker,et al.  One Size Fits All? Part 2: Benchmarking Studies , 2007, CIDR.

[16]  Timothy A. Davis,et al.  MATLAB Primer , 1994 .

[17]  Nick Roussopoulos,et al.  Faloutsos: "the r+- tree: a dynamic index for multidimensional objects , 1987 .

[18]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[19]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.