Guest Editorial to the special issue on data stream processing

Data stream management techniques have been a hot research area in the database community for the last 5 years. To our call for papers for this special issue with a deadline of October 2003 we received 23 submissions that covered a wide range of ongoing data stream research. In two rounds of review, we selected five papers that represent the diversity and depth of this research. Early work in data streams concentrated on developing efficient algorithms for specific data stream queries such as sampling, join size estimation, and quantiles. This issue shows that current data stream research has matured and transcended pure algorithmic research to novel data types such as XML and to core systems issues. The stream considered in the first paper consists of XML user queries rather than traditional data records. The paper considers how to efficiently mine frequent XML query patterns. As it is not feasible to keep all queries in main memory, the authors give efficient algorithms to incrementally maintain frequent user queries. The second paper considers how a data stream management system can deal with load spikes by carefully scheduling operators in the system. The suggested scheduling method, chain scheduling, keeps the output latency within a given bound while minimizing queuing memory. The third paper shows how to give approximate answers to aggregate queries over datasets undergoing constant change. In particular, this paper focuses on dealing with a stream that includes not only insertions of new data but also deletions of old data. The fourth paper is an experience paper. It describes the latest lessons from the design and implementation of the Aurora stream processing engine, and it describes the authors’ vision for their next system. The issue concludes with an article on data stream processing in sensor networks. Sensor nodes are different from traditional computers since energy is one of the limiting factors. The authors propose two methods for saving energy. First, they propose a group-aware network construction that minimizes network traffic. Second, they allow queries to specify that approximate query results (within user-specified bounds) are sufficient, a further opportunity to reduce traffic. Overall, we believe that these papers are an excellent snapshot of the state of the data stream community as of early 2004, and we hope that you will enjoy reading the papers as much as we did.