Guiding Search in Relational Pathfinding-based Concept Discovery via Bivariate Statistical Methods

Relational pathfinding-based systems learn concept descriptors by extending candidate concept descriptors by one literal at a time. As such learning systems usually deal with large search spaces, choosing literals to extend candidate concept descriptors becomes an essential issue. In this study we empirically analyze applicability of three bivariate statistical methods namely, frequency ratio, hazard index, and weight of evidence, as heuristics to choose literals to extend candidate concept descriptors. 10-fold experiments conducted on three benchmark datasets showed that frequency ratio, hazard index, and weight of evidence were able to reduce the space and hence provided speedups when compared to extending candidate concept descriptors by a randomly chosen literal. Moreover, the heuristic-based settings provided improved predictive accuracy.