Optimization and Execution of Complex Scientific Queries over Uncorrelated Experimental Data

Scientific experiments produce large volumes of data represented as complex objects that describe independent events such as particle collisions. Scientific analyses can be expressed as queries selecting objects that satisfy complex local conditions over properties of each object. The conditions include joins, aggregate functions, and numerical computations. Traditional query processing where data is loaded into a database does not perform well, since it takes time and space to load and index data. Therefore, we developed SQISLE to efficiently process in one pass large queries selecting complex objects from sources. Our contributions include runtime query optimization strategies, which during query execution collect runtime query statistics, reoptimize the query using collected statistics, and dynamically switch optimization strategies. Furthermore, performance is improved by query rewrites, temporary view materializations, and compile time evaluation of query fragments. We demonstrate that queries in SQISLE perform close to hard-coded C++ implementations of the same analyses.

[1]  F. Moortgat,et al.  Trilepton+top signal from chargino-neutralino decays of MSSM charged Higgs bosons at the LHC , 2003, hep-ph/0303093.

[2]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[3]  Tore Risch,et al.  Randomized Optimization of Object Oriented Queries in a Main Memory Database Management System , 2006 .

[4]  Ryan Newton,et al.  XStream: a Signal-Oriented Data Stream Management System , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Johan Petrini,et al.  Querying RDF Schema Views of Relational Databases , 2008 .

[6]  Donald D. Chamberlin,et al.  Access Path Selection in a Relational Database Management System , 1989 .

[7]  Arun N. Swami,et al.  Optimization of large join queries , 1988, SIGMOD '88.

[8]  Kenneth Salem,et al.  Query processing techniques for arrays , 1999, SIGMOD '99.

[9]  Vasco Amaral,et al.  Towards a full implementation of a robust solution of a domain specific visual query language for HEP physics analysis , 2007 .

[10]  Alexander S. Szalay,et al.  The Sloan Digital Sky Survey and beyond , 2008, SGMD.

[11]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[12]  Stavros Christodoulakis,et al.  On the propagation of errors in the size of join results , 1991, SIGMOD '91.

[13]  T. Ekelof,et al.  Discovery potential for a charged Higgs boson decaying in the chargino-neutralino channel of the ATLAS detector at the LHC , 2005, hep-ph/0504216.

[14]  Tore Risch,et al.  Cost-based Optimization of Complex Scientific Queries , 2007, 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007).

[15]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[16]  Ryan Newton,et al.  The Case for a Signal-Oriented Data Stream Management System , 2007, CIDR.

[17]  Neil D. Jones,et al.  An introduction to partial evaluation , 1996, CSUR.

[18]  Tore Risch,et al.  Functional Data Integration in a Distributed Mediator System , 2004 .

[19]  J. P. Teixeira,et al.  Results of the first performance tests * of the CMS electromagnetic calorimeter , 2005 .

[20]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[21]  Alon Y. Halevy,et al.  Adapting to source properties in processing data integration queries , 2004, SIGMOD '04.

[22]  Frederick Reiss,et al.  TelegraphCQ: An Architectural Status Report , 2003, IEEE Data Eng. Bull..

[23]  Tore Risch,et al.  Main Memory Oriented Optimization of OO Queries Using Typed Datalog with Foreign Predicates , 1992, IEEE Trans. Knowl. Data Eng..

[24]  Carlo Zaniolo,et al.  Optimization of Nonrecursive Queries , 1986, VLDB.

[25]  Ruslan Fomkin Optimization and Execution of Complex Scientific Queries , 2009 .

[26]  F. Rademakers,et al.  ROOT — An object oriented data analysis framework , 1997 .

[27]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[28]  Bernhard Mitschang,et al.  An Approach to Optimize Data Processing in Business Processes , 2007, VLDB.

[29]  Timos K. Sellis,et al.  Query Optimization for Nontraditional Database Applications , 1991, IEEE Trans. Software Eng..

[30]  Quanzhong Li,et al.  Adaptively Reordering Joins during Query Execution , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[31]  Jennifer Widom,et al.  Adaptive ordering of pipelined stream filters , 2004, SIGMOD '04.