Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics

The maturing of the field of data mining has brought about an increased level of mathematical sophistication. Such disciplines like topology, combinatorics, partially ordered sets and their associated algebraic structures (lattices and Boolean algebras), and metric spaces are increasingly applied in data mining research. This book presents these mathematical foundations of data mining integrated with applications to provide the reader with a comprehensive reference. Mathematics is presented in a thorough and rigorous manner offering a detailed explanation of each topic, with applications to data mining such as frequent item sets, clustering, decision trees also being discussed. More than 400 exercises are included and they form an integral part of the material. Some of the exercises are in reality supplemental material and their solutions are included. The reader is assumed to have a knowledge of elementary analysis. Features and topics: Study of functions and relations Applications are provided throughout Presents graphs and hypergraphs Covers partially ordered sets, lattices and Boolean algebras Finite partially ordered sets Focuses on metric spaces Includes combinatorics Discusses the theory of the Vapnik-Chervonenkis dimension of collections of sets This wide-ranging, thoroughly detailed volume is self-contained and intended for researchers and graduate students, and will prove an invaluable reference tool.

[1]  J. Lévy-Bruhl Introduction aux structures algébriques , 1968 .

[2]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[3]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[4]  Zoltán Daróczy,et al.  Generalized Information Functions , 1970, Inf. Control..

[5]  Francesco M. Malvestuto,et al.  Statistical treatment of the information content of a database , 1986, Inf. Syst..

[6]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[7]  G. S. Stiles,et al.  Fast full search equivalent encoding algorithms for image compression using vector quantization , 1992, IEEE Trans. Image Process..

[8]  G. A. Edgar Measure, Topology, and Fractal Geometry , 1990 .

[9]  Professor Sergiu Rudeanu Lattice Functions and Equations , 2001, Discrete Mathematics and Theoretical Computer Science.

[10]  Adam Ostaszewski,et al.  Topology : a geometric approach , 1992 .

[11]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  M. Fréchet Les dimensions d'un ensemble abstrait , 1910 .

[13]  Yiyu Yao,et al.  Two views of the theory of rough sets in finite universes , 1996, Int. J. Approx. Reason..

[14]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[15]  Kenneth L. Clarkson,et al.  Nearest Neighbor Queries in Metric Spaces , 1997, STOC '97.

[16]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[17]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[18]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[19]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[20]  Christos Faloutsos,et al.  Deflating the dimensionality curse using multiple fractal dimensions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[21]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[22]  P. Buneman A Note on the Metric Properties of Trees , 1974 .

[23]  G. A. Edgar Integral, probability, and fractal measures , 1997 .

[24]  Donald W. Kahn,et al.  Topology: An introduction to the point-set and algebraic areas , 1975 .

[25]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[26]  O. Ore Arc coverings of graphs , 1961 .

[27]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[28]  Ivo Düntsch,et al.  Rough set data analysis: A road to non-invasive knowledge discovery , 2000 .

[29]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[30]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[31]  Dirk Van Gucht,et al.  A measure-theoretic framework for constraints and bounds on measurements of data , 2005 .

[32]  Richard Bellman,et al.  Méthodes booléennes en recherche opérationnelle , 1970 .

[33]  Sergiu Rudeanu Boolean functions and equations , 1974 .

[34]  E. Sperner Ein Satz über Untermengen einer endlichen Menge , 1928 .

[35]  Ben Dushnik,et al.  Partially Ordered Sets , 1941 .

[36]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[37]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[38]  E. Lieb,et al.  Proof of the strong subadditivity of quantum‐mechanical entropy , 1973 .

[39]  J. Hatzenbuhler,et al.  DIMENSION THEORY , 1997 .

[40]  Tgk Toon Calders Axiomatization and deduction rules for the frequency of itemsets , 2003 .

[41]  Richard M. Dudley,et al.  Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[42]  David Maier,et al.  The Theory of Relational Databases , 1983 .

[43]  Dan A. Simovici,et al.  Impurity measures in databases , 2002, Acta Informatica.

[44]  Jean-Marc Adamo,et al.  Data Mining for Association Rules and Sequential Patterns , 2000, Springer New York.

[45]  L. D. Mesalkin A Generalization of Sperner’s Theorem on the Number of Subsets of a Finite Set , 1963 .

[46]  Géza Schay Introduction to Linear Algebra , 1996 .

[47]  Toshihide Ibaraki,et al.  Logical analysis of numerical data , 1997, Math. Program..

[48]  Patrick Suppes,et al.  Axiomatic set theory , 1969 .

[49]  Ulrik Brandes,et al.  On Modularity Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[50]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[51]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[52]  Enrique Vidal,et al.  New formulation and improvements of the nearest-neighbour approximating and eliminating search algorithm (AESA) , 1994, Pattern Recognit. Lett..

[53]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[54]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[55]  E. F. Codd,et al.  The Relational Model for Database Management, Version 2 , 1990 .

[56]  Koichiro Yamamoto Logarithmic order of free distributive lattice , 1954 .

[57]  Fred B. Schneider,et al.  A Theory of Graphs , 1993 .

[58]  S. Krantz Fractal geometry , 1989 .

[59]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[60]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[61]  Trevor Darrell,et al.  Nearest-Neighbor Searching and Metric Space Dimensions , 2006 .

[62]  Takio Kurita,et al.  An efficient agglomerative clustering algorithm using a heap , 1991, Pattern Recognit..

[63]  Szymon Jaroszewicz,et al.  Generalized Conditional Entropy and Decision Trees , 2003, EGC.

[64]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[65]  Andreas Björklund,et al.  Inclusion--Exclusion Algorithms for Counting Set Partitions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[66]  Dennis Shasha,et al.  New techniques for best-match retrieval , 1990, TOIS.

[67]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[68]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[69]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[70]  Mehmet M. Dalkilic,et al.  Information dependencies , 2000, PODS '00.

[71]  Richard E. Ladner,et al.  Nearest neighbor search for data compression , 1999, Data Structures, Near Neighbor Searches, and Methodology.

[72]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[73]  Elena Deza,et al.  Dictionary of distances , 2006 .

[74]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[75]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[76]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[77]  Dennis Shasha,et al.  Query Processing for Distance Metrics , 1990, VLDB.

[78]  P. Assouad Densité et dimension , 1983 .

[79]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[80]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[81]  Christos Faloutsos,et al.  On the 'Dimensionality Curse' and the 'Self-Similarity Blessing' , 2001, IEEE Trans. Knowl. Data Eng..

[82]  Michael T. Orchard,et al.  A fast nearest-neighbor search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[83]  Szymon Jaroszewicz,et al.  On Information-Theoretical Aspects of Relational Databases , 2000, Finite Versus Infinite.

[84]  Philip S. Yu,et al.  Mining Associations with the Collective Strength Approach , 2001, IEEE Trans. Knowl. Data Eng..

[85]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[86]  András Faragó,et al.  Fast Nearest-Neighbor Search in Dissimilarity Spaces , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[87]  Ronald L. Graham,et al.  Concrete Mathematics, a Foundation for Computer Science , 1991, The Mathematical Gazette.

[88]  M. Köppen,et al.  The Curse of Dimensionality , 2010 .

[89]  Zhao Ke-wen Hamilton-connected graphs with neighborhood union conditions , 2003 .

[90]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[91]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[92]  J. Marica,et al.  Differences of Sets and A Problem of Graham , 1969, Canadian Mathematical Bulletin.

[93]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[94]  Mathukumalli Vidyasagar,et al.  Learning and Generalization: With Applications to Neural Networks , 2002 .

[95]  Richard M. Wilson,et al.  A course in combinatorics , 1992 .

[96]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[97]  Dan A. Simovici,et al.  Relational Database Systems , 1995 .

[98]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[99]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[100]  John B. Fraleigh A first course in abstract algebra , 1967 .

[101]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[102]  I. Good,et al.  Ergodic theory and information , 1966 .

[103]  C. Sparrow The Fractal Geometry of Nature , 1984 .

[104]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[105]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[106]  Zbigniew Bonikowski,et al.  A Certain Conception of the Calculus of Rough Sets , 1992, Notre Dame J. Formal Log..

[107]  Michael Taylor Measure Theory and Integration , 2006 .

[108]  Pavel Berkhin,et al.  Learning Simple Relations: Theory and Applications , 2002, SDM.

[109]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[110]  M. Evans Munroe,et al.  Introduction to Measure and Integration , 1953 .

[111]  M. Fréchet Sur quelques points du calcul fonctionnel , 1906 .

[112]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[113]  Hannu Toivonen,et al.  Efficient discovery of functional and approximate dependencies using partitions , 1998, Proceedings 14th International Conference on Data Engineering.

[114]  Heikki Mannila,et al.  Approximate Dependency Inference from Relations , 1992, ICDT.