Performing Group-By before Join

Assume that we have an SqL query containing joins and a group-by. The standard way of evaluating this type of query is to first perform all the joins and then the group-by operation. However, it may be possible to perform the group-by early, that is, to push the groupby operation past one or more joins. Early grouping may reduce the query processing cost by reducing the amount of data participating in joins. We formally define the problem, adhering strictly to the semantics of NULL and duplicate elimination in SqLQ and prove necessary and suficient conditions for deciding when this transfownation is valid. In practice, it may be ezpensive OT even impossible to test whether the conditions are satisfied. Therefore, we also present a more practical algorithm that tests a simpler, suficient condition. This algorithm is fast and detects a large subclass of transformable queries.