Extrapolation Prefix Tree for Data Stream Mining Using a Landmark Model

Since the introduction of FP-growth there has been extensive research into extending its usage to data streams or incremental mining. This task is particularly challenging in the data stream environment because of the unbounded nature of a data stream and the need for avoiding multiple scans of the data. In this paper, we propose an algorithm, Extrapolation Prefix Tree that extracts frequent itemsets using a landmark windowing scheme. The algorithm uses a prefix tree structure to store arriving transactions, but unlike previous approaches estimates the structure of the tree in the next block of data based on the arrival pattern of items appearing in transactions that arrive in the current block. Our experimentation shows that Extrapolation-Tree significantly outperforms the CP-Tree, both in terms of the number of updates and the execution time required to keep the tree current while maintaining a compact tree.