Mining for Patterns Based on Contingency Tables by KL-Miner - First Experience

Abstract A new datamining procedure called KL–Miner is presented.The procedure mines for various patterns based on eval-uation of two–dimensional contingency tables, includingpatterns of statistical nature. The procedure is a result ofcontinued development of the academic LISp-Miner sys-tem for KDD. Keywords Data mining, contingency tables, the system LISp–Miner,statistical patterns 1 Introduction Goal of this paper is to present first experience with datamining procedure KL-Miner. The procedure mines forpatterns of the formR∼ C/γ.Here Rand Care categorial attributes, the attribute Rhas categories (possible values) r 1 ,...,r K , the attributeChas categories c 1 ,...,c L . Further, γis a Boolean at-tribute.The KL-Miner procedure deals with data matrices. Wesuppose that Rand Ccorrespond to columns of the anal-ysed data matrix. We further suppose that the Booleanattribute γ is somehow derived from other columns ofthe analysed data matrix and thus that it corresponds toa Boolean column of the analysed data matrix.The intuitive meaning of the expression R ∼ C/γis that the attributes Rand C are in relation given bythe symbol ∼ when the condition given by the derivedBoolean attribute γis satisfied.The symbol ∼ is called KL-quantifier. It correspondsto a condition imposed by the user on the contingency ta-ble of Rand C. There are several restrictions that the usercan choose to use (e.g. minimal value, sum over the table,value of the χ