Scalable continuous query processing

Continuous queries are persistent queries that allow users to receive new results when they become available. They are very useful in many data change centric applications and in particular, an Internet environment that is comprised of large amounts of frequently changing information. Compared to common queries that are deleted immediately after their execution, continuous queries can stay in the system for arbitrary long periods. In order to handle a large number of users with diverse interests, a continuous query system must be capable of supporting millions of continuous queries expressible as complex queries against web-resident data sets. No existing systems have achieved this level of scalability. In this dissertation, we address this problem by grouping continuous queries based on the observation that many web queries share similar structures. Grouped queries can share the common computation, tend to fit in memory and can reduce the I/O cost significantly. Furthermore, grouping on selection predicates can eliminate a large number of unnecessary query invocations. Our grouping technique is mainly distinguished from previous group optimization approaches in that we use an incremental group optimization strategy over a large, dynamic query workload. In addition, we design and evaluate alternative selection placement strategies for optimizing a very large number of continuous queries in an Internet environment. Furthermore, we design and evaluate an efficient and dynamic regrouping approach in optimizing a large continuous query workload.