Query processing over live and archived data streams

It is becoming increasingly apparent that many important applications over streaming data require access not only to the live streaming data, but also historical portions of the streams. The combining of live and historical data is fundamental to performing tracking, planning and ad-hoc monitoring functions in diverse domains including freeway traffic monitoring, supply chain management using RFID data, network traffic analysis, and financial stream analysis. It is therefore imperative that a Data Stream Management System (DSMS) underlying these applications have the ability to archive the streams to disk and support queries over a combination of live and archived data. This thesis identifies challenges associated with such processing, and proposes solutions at the storage manager (PATOIS, OSCAR, FELIX) and at the executor (PSoup) to enable such applications. PATOIS is a framework for overload-handling in the storage manager, that emphasizes the effect of load-reducing solutions on the accuracy of results returned by the query processor. FELIX and OSCAR are important components of this framework that focus on reducing the load associated with index-insertions and archive-lookups respectively: the former results in no loss of query accuracy, while the latter allows users to trade off accuracy loss in running queries for increased throughput. These techniques are essential to the operation of DSMSs that archive data and support queries that access the archive. PSoup is a novel approach to query processing based on the duality of queries and data that is observed across the various classes of queries over streaming applications. It allows the efficient and seamless execution of various queries over streaming data.