Information-Theoretic Measures for Knowledge Discovery and Data Mining

A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many information-theoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of information-theoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections.

[1]  Yiyu Yao,et al.  Information tables with neighborhood semantics , 2000, SPIE Defense + Commercial Sensing.

[2]  Ching Y. Suen,et al.  Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  W. R. Garner,et al.  The relation between information and variance analyses , 1956 .

[5]  Arno J. Knobbe,et al.  Analysing Binary Associations , 1996, KDD.

[6]  Yiyu Yao,et al.  An Analysis of Quantitative Measures Associated with Rules , 1999, PAKDD.

[7]  S. K. Wong,et al.  An Information-Theoretic Measure of Term Specificity. , 1992 .

[8]  S.K.M. Wong,et al.  On data and probabilistic dependencies , 1999, Engineering Solutions for the Next Millennium. 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.99TH8411).

[9]  C. Rajski,et al.  A Metric Space of Discrete Probability Distributions , 1961, Inf. Control..

[10]  Siegfried Bell Discovery and Maintenance of Functional Dependencies by Independencies , 1995, KDD.

[11]  E. H. Linfoot An Informational Measure of Correlation , 1957, Inf. Control..

[12]  Yasuichi Horibe,et al.  Entropy and correlation , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[13]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Tsau Young Lin,et al.  Introducing the book , 2000 .

[15]  Yasuichi Horibe,et al.  A Note on Entropy Metrics , 1973, Inf. Control..

[16]  Wen-Chi Hou,et al.  Extraction and Applications of Statistical Relationships in Relational Databases , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Neil J. A. Sloane,et al.  Some Topics in Information Theory , 1993 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  J. Lin,et al.  A NEW DIRECTED DIVERGENCE MEASURE AND ITS CHARACTERIZATION , 1990 .

[20]  Rajeev Motwani,et al.  Beyond Market Baskets: Generalizing Association Rules to Dependence Rules , 1998, Data Mining and Knowledge Discovery.

[21]  Demetrios Kazakos,et al.  A Decision Theory Approach to the Approximation of Discrete Probability Densities , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Satosi Watanabe,et al.  Knowing and guessing , 1969 .

[23]  Arbee L. P. Chen,et al.  Evaluating Aggregate Operations Over Imprecise Data , 1996, IEEE Trans. Knowl. Data Eng..

[24]  Satosi Watanabe,et al.  Pattern recognition as a quest for minimum entropy , 1981, Pattern Recognit..

[25]  Thomas B. Sheridan,et al.  Man-machine systems;: Information, control, and decision models of human performance , 1974 .

[26]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[27]  Tony T. Lee,et al.  An Infornation-Theoretic Analysis of Relational Databases—Part I: Data Dependencies and Information Metric , 1987, IEEE Transactions on Software Engineering.

[28]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[29]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[30]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[31]  Lech Polkowski,et al.  Rough Sets in Knowledge Discovery 2 , 1998 .

[32]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[33]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[34]  Nicolas Spyratos The partition model: a deductive database model , 1987, TODS.

[35]  A. Liebetrau Measures of association , 1983 .

[36]  Stefan Kramer,et al.  Compression-Based Evaluation of Partial Determinations , 1995, KDD.

[37]  Calyampudi R. Rao Diversity and dissimilarity coefficients: A unified approach☆ , 1982 .

[38]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[39]  Ching-Lai Hwang,et al.  Multiple Attribute Decision Making: Methods and Applications - A State-of-the-Art Survey , 1981, Lecture Notes in Economics and Mathematical Systems.

[40]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[41]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[42]  Tsau Young Lin,et al.  Rough Sets and Data Mining: Analysis of Imprecise Data , 1996 .

[43]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[44]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[45]  Silviu Guiaşu,et al.  Information theory with applications , 1977 .

[46]  S. K. Michael Wong,et al.  Rough Sets: Probabilistic versus Deterministic Approach , 1988, Int. J. Man Mach. Stud..

[47]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[48]  Tarald O. Kvålseth,et al.  Entropy and Correlation: Some Comments , 1987, IEEE Transactions on Systems, Man, and Cybernetics.

[49]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[50]  M. Zeleny Linear Multiobjective Programming , 1974 .

[51]  Ching-Lai Hwang,et al.  Fuzzy Multiple Attribute Decision Making - Methods and Applications , 1992, Lecture Notes in Economics and Mathematical Systems.

[52]  Maria E. Orlowska,et al.  CCAIIA: Clustering Categorial Attributed into Interseting Accociation Rules , 1998, PAKDD.

[53]  Rüdiger Wirth,et al.  Discovery of Association Rules over Ordinal Data: A New and Faster Algorithm and Its Application to Basket Analysis , 1998, PAKDD.

[54]  W. T. Singleton,et al.  Man-machine systems , 1974 .

[55]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[56]  Francesco M. Malvestuto,et al.  Statistical treatment of the information content of a database , 1986, Inf. Syst..

[57]  Yiyu Yao,et al.  On Association, Similarity and Dependency of Attributes , 2000, PAKDD.

[58]  Tsau Young Lin,et al.  A Review of Rough Set Models , 1997 .

[59]  Yiyu Yao,et al.  A probability distribution model for information retrieval , 1989, Inf. Process. Manag..

[60]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[61]  Judea Pearl,et al.  Chapter 2 – BAYESIAN INFERENCE , 1988 .

[62]  Yiyu Yao,et al.  Granular computing using information tables , 2002 .

[63]  Masayoshi Tomizuka,et al.  Man-machine systems: information, control and decision models of human performance , 1975 .

[64]  Ron Kohavi,et al.  Oblivious Decision Trees, Graphs, and Top-Down Pruning , 1995, IJCAI.

[65]  Chi Hau Chen,et al.  Statistical Pattern Recognition. , 1973 .

[66]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[67]  Yiyu Yao,et al.  On Information-Theoretic Measures of Attribute Importance , 1999, PAKDD.