Data Mining and Knowledge Discovery Methods with Case Examples

This chapter deals with the area of knowledge discovery and data mining that has emerged as an important research direction for extracting useful information from vast repositories of data of various types. The basic concepts, problems, and challenges are first briefly discussed. Some of the major data mining tasks like classification, clustering, and association rule mining are then described in some detail. This is followed by a description of some tools that are frequently used for data mining. Two case examples of supervised and unsupervised classification for satellite image analysis are presented. Finally, an extensive bibliography is provided.

[1]  James F. Baldwin Knowledge from data using fuzzy methods , 1996, Pattern Recognit. Lett..

[2]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[3]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[4]  Witold Pedrycz,et al.  Data Mining: A Knowledge Discovery Approach , 2007 .

[5]  SANGHAMITRA BANDYOPADHYAY,et al.  Clustering Using Simulated Annealing with Probabilistic Redistribution , 2001, Int. J. Pattern Recognit. Artif. Intell..

[6]  Ujjwal Maulik,et al.  Incorporating Chromosome Differentaition in Genetic Algorithms , 1998, Inf. Sci..

[7]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .

[8]  Ujjwal Maulik,et al.  Advanced Methods for Knowledge Discovery from Complex Data , 2005 .

[9]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[10]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[11]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[12]  Sanghamitra Bandyopadhyay,et al.  VGA-Classifier: design and applications , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Sanghamitra Bandyopadhyay,et al.  Classification and learning using genetic algorithms - applications in bioinformatics and web intelligence , 2007, Natural computing series.

[15]  Guoqing Chen,et al.  Mining generalized association rules with fuzzy taxonomic structures , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[16]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[17]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[18]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[19]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[20]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[21]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[22]  Sanghamitra Bandyopadhyay,et al.  Genetic algorithms for generation of class boundaries , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[23]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[24]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[25]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Sanghamitra Bandyopadhyay Simulated annealing using a reversible jump Markov chain Monte Carlo algorithm for fuzzy clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[27]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[29]  Nicholas J. Radcliffe,et al.  A Genetic Algorithm-Based Approach to Data Mining , 1996, KDD.

[30]  Philip K. Chan,et al.  Advances in Distributed and Parallel Knowledge Discovery , 2000 .

[31]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[32]  N. J. Radcliffe,et al.  GA-MINER: Parallel Data Mining with Hierarchical Genetic Algorithms Final Report , 1995 .

[33]  Marley M. B. R. Vellasco,et al.  Rule-Evolver: An Evolutionary Approach for Data Mining , 1999, RSFDGrC.

[34]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[35]  Hichem Frigui,et al.  A Robust Competitive Clustering Algorithm With Applications in Computer Vision , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Witold Pedrycz,et al.  Fuzzy set technology in knowledge discovery , 1998, Fuzzy Sets Syst..

[37]  Ding-An Chiang,et al.  Mining time series data by a fuzzy linguistic summary system , 2000, Fuzzy Sets Syst..

[38]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[39]  Sanghamitra Bandyopadhyay,et al.  Pattern classification with genetic algorithms , 1995, Pattern Recognit. Lett..

[40]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[41]  Sanghamitra Bandyopadhyay,et al.  Satellite image classification using genetically guided fuzzy clustering with spatial information , 2005 .

[42]  Erik D. Goodman,et al.  Genetic programming for improved data mining: application to the biochemistry of protein interactions , 1996 .

[43]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[44]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[45]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[46]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[47]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..

[48]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[49]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[50]  Sanghamitra Bandyopadhyay,et al.  Pixel classification using variable string genetic algorithms with chromosome differentiation , 2001, IEEE Trans. Geosci. Remote. Sens..

[51]  Ronald R. Yager Database discovery using fuzzy sets , 1996 .

[52]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[53]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[54]  Nicolas Monmarché,et al.  Interactive Design of Web Sites with a Genetic Algorithm , 2002, ICWI.

[55]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[56]  Judith E. Dayhoff,et al.  Neural Network Architectures: An Introduction , 1989 .

[57]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[58]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .