Increasing the Efficiency of Data Mining Algorithms with Breadth-First Marker Propagation

This paper describes how to increase the efficiency of inductive data mining algorithms by replacing the central matching operation with a marker propagation technique. Breadth-first marker propagation is most beneficial when the data are linked to hierarchical background knowledge (e.g., tree-structured attributes), or when the attributes describing the data have many values. We support our claims analytically with complexity arguments and empirically on several large data sets. We also point out other efficiency gains, including reduced memory management overhead, which facilitate mining massive tape archives.