A method to characterize dataset based on objective rule evaluation indices

Data characterizing techniques have been developed to control learning algorithm selection by using statistical measurements of a dataset. To expand the framework of meta-learning, it is important to consider results of other learning algorithms. Therefore, we consider about a method to reuse objective rule evaluation indices of classification rules. Objective rule evaluation indices such as support, precision and recall are calculated by using a rule set and a validation dataset. This data-driven approach is often used to filter out not useful rules from obtained rule set by a rule learning algorithm. At the same time, these indices can detect differences between two validation datasets by using the rule set and the indices, because the definitions of indices independent on both of a rule and a dataset. In this paper, we present a method to characterize given datasets based on objective rule evaluation indices by using differences of correlation coefficients between each index. By comparing the differences, we describe the results of similar/dissimilar groups of the datasets.

[1]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[2]  Yiyu Yao,et al.  An Analysis of Quantitative Measures Associated with Rules , 1999, PAKDD.

[3]  Yiyu Yao,et al.  Peculiarity Oriented Multidatabase Mining , 2003, IEEE Trans. Knowl. Data Eng..

[4]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[5]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[6]  Carlos Bento,et al.  A Metric for Selection of the Most Promising Rules , 1998, PKDD.

[7]  Ian Witten,et al.  Data Mining , 2000 .

[8]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[9]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[12]  Yiyu Yao,et al.  Peculiarity Oriented Multi-database Mining , 1999, PKDD.

[13]  Shusaku Tsumoto,et al.  Evaluation of rule interestingness measures in medical knowledge discovery in databases , 2007, Artif. Intell. Medicine.

[14]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[15]  Howard J. Hamilton,et al.  Machine Learning of Credible Classifications , 1997, Australian Joint Conference on Artificial Intelligence.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[17]  Maria E. Orlowska,et al.  CCAIIA: Clustering Categorial Attributed into Interseting Accociation Rules , 1998, PAKDD.

[18]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[19]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[20]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[21]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.