Information-Theoretic Measures for Knowledge Discovery and Data Mining

A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many information-theoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of informationtheoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  W. R. Garner,et al.  The relation between information and variance analyses , 1956 .

[3]  E. H. Linfoot An Informational Measure of Correlation , 1957, Inf. Control..

[4]  C. Rajski,et al.  A Metric Space of Discrete Probability Distributions , 1961, Inf. Control..

[5]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[6]  Satosi Watanabe,et al.  Knowing and guessing , 1969 .

[7]  S Kullback Some Topics in Statistical Information Theory , 1973 .

[8]  Yasuichi Horibe,et al.  A Note on Entropy Metrics , 1973, Inf. Control..

[9]  Chi Hau Chen,et al.  Statistical Pattern Recognition. , 1973 .

[10]  Thomas B. Sheridan,et al.  Man-machine systems;: Information, control, and decision models of human performance , 1974 .

[11]  M. Zeleny Linear Multiobjective Programming , 1974 .

[12]  Silviu Guiaşu,et al.  Information theory with applications , 1977 .

[13]  Demetrios Kazakos,et al.  A Decision Theory Approach to the Approximation of Discrete Probability Densities , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Satosi Watanabe,et al.  Pattern recognition as a quest for minimum entropy , 1981, Pattern Recognit..

[15]  Calyampudi R. Rao Diversity and dissimilarity coefficients: A unified approach☆ , 1982 .

[16]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[17]  A. Liebetrau Measures of association , 1983 .

[18]  Ching Y. Suen,et al.  Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yasuichi Horibe,et al.  Entropy and correlation , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[20]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Francesco M. Malvestuto,et al.  Statistical treatment of the information content of a database , 1986, Inf. Syst..

[22]  Nicolas Spyratos The partition model: a deductive database model , 1987, TODS.

[23]  Tony T. Lee,et al.  An Infornation-Theoretic Analysis of Relational Databases—Part I: Data Dependencies and Information Metric , 1987, IEEE Transactions on Software Engineering.

[24]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[26]  S. K. Michael Wong,et al.  Rough Sets: Probabilistic versus Deterministic Approach , 1988, Int. J. Man Mach. Stud..

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  Yiyu Yao,et al.  A probability distribution model for information retrieval , 1989, Inf. Process. Manag..

[29]  J. Lin,et al.  A NEW DIRECTED DIVERGENCE MEASURE AND ITS CHARACTERIZATION , 1990 .

[30]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[31]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[32]  Ching-Lai Hwang,et al.  Multiple Attribute Decision Making — An Overview , 1992 .

[33]  Yiyu Yao,et al.  An Information-Theoretic Measure of Term Specificity , 1992, J. Am. Soc. Inf. Sci..

[34]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[35]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[36]  Siegfried Bell Discovery and Maintenance of Functional Dependencies by Independencies , 1995, KDD.

[37]  Stefan Kramer,et al.  Compression-Based Evaluation of Partial Determinations , 1995, KDD.

[38]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[39]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[40]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[41]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[42]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[43]  Dimitar P. Filev,et al.  Fuzzy SETS AND FUZZY LOGIC , 1996 .

[44]  Tsau Young Lin,et al.  Rough Sets and Data Mining: Analysis of Imprecise Data , 1996 .

[45]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[46]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[47]  Wen-Chi Hou,et al.  Extraction and Applications of Statistical Relationships in Relational Databases , 1996, IEEE Trans. Knowl. Data Eng..

[48]  Tsau Young Lin,et al.  A Review of Rough Set Models , 1997 .

[49]  Rüdiger Wirth,et al.  Discovery of Association Rules over Ordinal Data: A New and Faster Algorithm and Its Application to Basket Analysis , 1998, PAKDD.

[50]  Lech Polkowski,et al.  Rough Sets in Knowledge Discovery 2 , 1998 .

[51]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[52]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[53]  Maria E. Orlowska,et al.  CCAIIA: Clustering Categorial Attributed into Interseting Accociation Rules , 1998, PAKDD.

[54]  S.K.M. Wong,et al.  On data and probabilistic dependencies , 1999, Engineering Solutions for the Next Millennium. 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.99TH8411).

[55]  Yiyu Yao,et al.  On Information-Theoretic Measures of Attribute Importance , 1999, PAKDD.

[56]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[57]  Yiyu Yao,et al.  An Analysis of Quantitative Measures Associated with Rules , 1999, PAKDD.

[58]  Yiyu Yao,et al.  Information tables with neighborhood semantics , 2000, SPIE Defense + Commercial Sensing.

[59]  Yiyu Yao,et al.  On Association, Similarity and Dependency of Attributes , 2000, PAKDD.

[60]  Yiyu Yao,et al.  Mining market value functions for targeted marketing , 2001, 25th Annual International Computer Software and Applications Conference. COMPSAC 2001.

[61]  Yiyu Yao,et al.  Granular computing using information tables , 2002 .

[62]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[63]  Rajeev Motwani,et al.  Beyond Market Baskets: Generalizing Association Rules to Dependence Rules , 1998, Data Mining and Knowledge Discovery.

[64]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[65]  Juni Palmgren,et al.  Analysis of binary traits: testing association in the presence of linkage , 2005, BMC genetics.