Operator-Based Query Progress Estimation

Recently, research has addressed the probl em of estimating progr ess for long-running data - base queries. The basic idea is to “continuously” monitor execution to keep track of how much work has been done, and at the same time to collect statistics to arrive at a more and more refined estimate of the total amount of work that is needed. Previous research has generally decomposed the operator tree for the query into pipelines (or “segments”) of non-blocki ng operators, tried to observe progress per pipeline and then to combine progress measures of the different pipelines into an overall progress measure. It has soon become apparent that pipelines of non-blocking operators are too large units and that it is necessary to define smaller segments (e.g. containing only one join operator). In this paper we take a more radical approach where each operator in a query tree is able to estimate the progress achieved for its subtree based on the progress re ported by its children. No global analysis of the query tree is needed, nor is it necessary to determin e driver nodes or dominant inputs. E ach operator is strictly independent in its progres s estimation. Nevertheless progress estimation works fine across block - ing operators and for the whole quer y tree. The technique lends itself to a simple and clean implementa - tion. It is suitable for extensible database archit ectures where the set of qu ery processing operators is large and possibly extended at any time. Our impl ementation allows one to add progress support for operators gradually such that the sy stem runs at any time and reports progress whenever all operators in the query tree support progress. We report a prototypical implementation in the S ECONDO extensible database system. Progress estimation now is a standard feature of S ECONDO . To our knowledge it is the first freely available DBMS prototype th at includes query progress estimation.