Black-box determination of cost models' parameters for federated stream-processing systems

For distribution and deployment of queries in distributed stream-processing environments, it is vital to estimate the expected costs in advance. Having heterogeneous Stream-Processing Systems (SPSs) running on various hosts, the parameters of a cost model for an operator must be determined by measurements for each relevant combination of an SPS and hardware. This paper presents a black-box method that determines the parameters of appropriate cost models that regard system-specific behavior. For some SPSs, there might not be any appropriate cost model available due to the lack of internal knowledge. If no cost model is available for any reason, we provide and apply a non-parametric model.

[1]  Weimin Du,et al.  Query Optimization in a Heterogeneous DBMS , 1992, VLDB.

[2]  Klaus Meyer-Wegener,et al.  Propagation of Densities of Streaming Data within Query Graphs , 2010, SSDBM.

[3]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[4]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[5]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[6]  D. Marquardt An Algorithm for Least-Squares Estimation of Nonlinear Parameters , 1963 .

[7]  R. Penrose A Generalized inverse for matrices , 1955 .

[8]  S. Silvey Multicollinearity and Imprecise Estimation , 1969 .

[9]  Bernhard Seeger,et al.  A Cost-Based Approach to Adaptive Resource Management in Data Stream Systems , 2008, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ying Liu,et al.  Multi-model Based Optimization for Stream Query Processing , 2006, SEKE.

[11]  T. Greville,et al.  Some Applications of the Pseudoinverse of a Matrix , 1960 .

[12]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[13]  Per-Åke Larson,et al.  A query sampling method for estimating local cost parameters in a multidatabase system , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[14]  Donald Eugene. Farrar,et al.  Multicollinearity in Regression Analysis; the Problem Revisited , 2011 .

[15]  Hyeong-Ah Choi,et al.  Cost-based Solution for Optimizing Multi-join Queries over Distributed Streaming Sensor Data , 2006, 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[16]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[17]  Jeffrey F. Naughton,et al.  Rate-based query optimization for streaming information sources , 2002, SIGMOD '02.

[18]  Schahram Dustdar,et al.  Composable cost estimation and monitoring for computational applications in cloud computing environments , 2010, ICCS.

[19]  Klaus Meyer-Wegener,et al.  Integration of Heterogeneous Sensor Nodes by Data Stream Management , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[20]  Jeffrey F. Naughton,et al.  Static optimization of conjunctive queries with sliding windows over infinite streams , 2004, SIGMOD '04.

[21]  Qiang Zhu,et al.  Developing cost models with qualitative variables for dynamic multidatabase environments , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[22]  Sven Schmidt,et al.  Quality of service aware data stream processing , 2007 .

[23]  Sang Hyuk Son,et al.  Prediction-Based QoS Management for Real-Time Data Streams , 2006, 2006 27th IEEE International Real-Time Systems Symposium (RTSS'06).

[24]  Alfons Kemper,et al.  Data Stream Sharing , 2006, EDBT Workshops.