Index Support for Mining Data Streams in a Relational DBMS

This paper presents a novel index, called I-Forest, to support data mining activities on data streams, i.e., sequences of incoming data blocks. This approach is appropriate for itemset extraction on evolving datasets such as analysis of transactional data streams from retail chains. The index is a covering structure that represents transaction blocks in a succinct form and allows different kinds of analysis (e.g., analyze quarterly data). During the creation phase no support constraint is enforced, thus the index provides a complete representation of the data stream. The I-Forest index has been implemented into the PostgreSQL open source DBMS and exploits its physical level access methods. Preliminary experiments have been run to validate the proposed approach.