Knowledge Discovery and Data Mining

Knowledge discovery and data mining has recently emerged as an important research direction for extracting useful information from vast repositories of data of various types. This chapter discusses some of the basic concepts and issues involved in this process with special emphasis on different data mining tasks. The major challenges in data mining are mentioned. Finally, the recent trends in data mining are described and an extensive bibliography is provided.

[1]  Simon C. K. Shiu,et al.  Foundations of Soft Case-Based Reasoning: Pal/Soft Case-Based Reasoning , 2004 .

[2]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[3]  Anantha Chandrakasan,et al.  Algorithmic transforms for efficient energy scalable computation , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[4]  Joaquín Dopazo,et al.  Phylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species , 2004, ISMB/ECCB.

[5]  Ossama Younis,et al.  HEED: a hybrid, energy-efficient, distributed clustering approach for ad hoc sensor networks , 2004, IEEE Transactions on Mobile Computing.

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  A. Konig Interactive visualization and analysis of hierarchical neural projections for data mining , 2000 .

[8]  Lotfi A. Zadeh,et al.  Fuzzy logic, neural networks, and soft computing , 1993, CACM.

[9]  Majid Sarrafzadeh,et al.  Optimal Energy Aware Clustering in Sensor Networks , 2002 .

[10]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[11]  Alex A. Freitas,et al.  Discovering interesting prediction rules with a genetic algorithm , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[12]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  S. Sitharama Iyengar,et al.  Distributed Bayesian algorithms for fault-tolerant event region detection in wireless sensor networks , 2004, IEEE Transactions on Computers.

[15]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[16]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[17]  Sanghamitra Bandyopadhyay,et al.  Pattern classification with genetic algorithms: Incorporation of chromosome differentiation , 1997, Pattern Recognit. Lett..

[18]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[19]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[20]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21]  Witold Pedrycz,et al.  Fuzzy set technology in knowledge discovery , 1998, Fuzzy Sets Syst..

[22]  Keith C. C. Chan,et al.  An effective algorithm for discovering fuzzy rules in relational databases , 1998, 1998 IEEE International Conference on Fuzzy Systems Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36228).

[23]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..

[24]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[25]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[26]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[27]  King-Sun Fu,et al.  Syntactic Pattern Recognition And Applications , 1968 .

[28]  Hillol Kargupta,et al.  Distributed Clustering Using Collective Principal Component Analysis , 2001, Knowledge and Information Systems.

[29]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[30]  T. W. Anderson,et al.  An Introduction to Multivariate Statistical Analysis , 1959 .

[31]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[32]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Sanghamitra Bandyopadhyay,et al.  Pattern classification using genetic algorithms: Determination of H , 1998, Pattern Recognit. Lett..

[34]  Raghu Krishnapuram,et al.  A fuzzy approach to complex linguistic query based image retrieval , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[35]  Bogdan Dorohonceanu,et al.  Accelerating Protein Classification Using Suffix Trees , 2000, ISMB.

[36]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[37]  Krishna M. Sivalingam,et al.  Learning from class-imbalanced data in wireless sensor networks , 2003, 2003 IEEE 58th Vehicular Technology Conference. VTC 2003-Fall (IEEE Cat. No.03CH37484).

[38]  Bart Goethals,et al.  Efficient frequent pattern mining , 2002 .

[39]  T. Watanabe,et al.  Classification and function estimation of protein by using data compression and genetic algorithms , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[40]  Ronald R. Yager Database discovery using fuzzy sets , 1996 .

[41]  Shokri Z. Selim,et al.  K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Joaquín Dopazo,et al.  Supervised Neural Networks for Clustering Conditions in DNA Array Data After Reducing Noise by Clustering Gene Expression Profiles , 2002 .

[43]  Dimitrios Gunopulos,et al.  Distributed deviation detection in sensor networks , 2003, SGMD.

[44]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[45]  Nicolas Monmarché,et al.  Interactive Design of Web Sites with a Genetic Algorithm , 2002, ICWI.

[46]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[47]  Peter Stagge,et al.  Recurrent neural networks for time series classification , 2003, Neurocomputing.

[48]  N. J. Radcliffe,et al.  GA-MINER: Parallel Data Mining with Hierarchical Genetic Algorithms Final Report , 1995 .

[49]  Michael G. Thomason,et al.  Syntactic Pattern Recognition, An Introduction , 1978, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[51]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[52]  Krishna M. Sivalingam,et al.  Data Gathering Algorithms in Sensor Networks Using Energy Metrics , 2002, IEEE Trans. Parallel Distributed Syst..

[53]  Sanghamitra Bandyopadhyay,et al.  Pattern classification with genetic algorithms , 1995, Pattern Recognit. Lett..

[54]  A. G. Constantinides,et al.  An heuristic pattern correction scheme for GRNNs and its application to speech recognition , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[55]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[56]  S. Bandyopadhyay,et al.  Nonparametric genetic clustering: comparison of validity indices , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[57]  Qiang Yang,et al.  Maintaining Unstructured Case Base , 1997, ICCBR.

[58]  Anantha P. Chandrakasan,et al.  An application-specific protocol architecture for wireless microsensor networks , 2002, IEEE Trans. Wirel. Commun..

[59]  Yinghua Lin,et al.  A new approach to fuzzy-neural system modeling , 1995, IEEE Trans. Fuzzy Syst..

[60]  Geoffrey I. Webb Efficient search for association rules , 2000, KDD '00.

[61]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[62]  Ding-An Chiang,et al.  Mining time series data by a fuzzy linguistic summary system , 2000, Fuzzy Sets Syst..

[63]  H. Frigui Adaptive image retrieval using the fuzzy integral , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[64]  Jaideep Srivastava,et al.  WEBKDD 2002: Web mining for usage patterns & profiles , 2002, SKDD.

[65]  Erik D. Goodman,et al.  Genetic programming for improved data mining: application to the biochemistry of protein interactions , 1996 .

[66]  D. O. Hebb,et al.  The organization of behavior , 1988 .

[67]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[68]  Sankar K. Pal,et al.  Rough-Fuzzy MLP: Modular Evolution, Rule Generation, and Evaluation , 2003, IEEE Trans. Knowl. Data Eng..

[69]  Sanghamitra Bandyopadhyay,et al.  Genetic algorithms for generation of class boundaries , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[70]  S. Bandyopadhyay,et al.  Evolutionary computation in bioinformatics: a review , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[71]  Foster J. Provost,et al.  A Survey of Methods for Scaling Up Inductive Algorithms , 1999, Data Mining and Knowledge Discovery.

[72]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[73]  L. B. Turksen,et al.  Fuzzy data mining and expert system development , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[74]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[75]  Lui Sha,et al.  Dynamic Clustering for Acoustic Target Tracking in Wireless Sensor Networks , 2004, IEEE Trans. Mob. Comput..

[76]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[77]  Witold Pedrycz,et al.  Conditional Fuzzy C-Means , 1996, Pattern Recognit. Lett..

[78]  Sanghamitra Bandyopadhyay,et al.  An efficient technique for superfamily classification of amino acid sequences: feature extraction, fuzzy clustering and prototype selection , 2005, Fuzzy Sets Syst..

[79]  Wendi Heinzelman,et al.  Proceedings of the 33rd Hawaii International Conference on System Sciences- 2000 Energy-Efficient Communication Protocol for Wireless Microsensor Networks , 2022 .

[80]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[81]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[82]  SANGHAMITRA BANDYOPADHYAY,et al.  Clustering Using Simulated Annealing with Probabilistic Redistribution , 2001, Int. J. Pattern Recognit. Artif. Intell..

[83]  C. Mohan,et al.  Dynamic E-business: Trends in Web Services , 2002, TES.

[84]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[85]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[86]  Nicholas J. Radcliffe,et al.  A Genetic Algorithm-Based Approach to Data Mining , 1996, KDD.

[87]  Vipin Kumar,et al.  Distributed and parallel knowledge discovery (workshop session) (title only) , 2000, Knowledge Discovery and Data Mining.

[88]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[89]  James F. Baldwin Knowledge from data using fuzzy methods , 1996, Pattern Recognit. Lett..

[90]  Hongjun Lu,et al.  Effective Data Mining Using Neural Networks , 1996, IEEE Trans. Knowl. Data Eng..

[91]  Guoqing Chen,et al.  Mining generalized association rules with fuzzy taxonomic structures , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[92]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[93]  Ian D. Watson,et al.  Applying case-based reasoning - techniques for the enterprise systems , 1997 .

[94]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[95]  Theodosios Pavlidis,et al.  Structural pattern recognition , 1977 .

[96]  Joseph P. Bigus,et al.  Data mining with neural networks: solving business problems from application development to decision support , 1996 .

[97]  Joaquín Dopazo,et al.  Using a Genetic Algorithm and a Perceptron for Feature Selection and Supervised Class Learning in DNA Microarray Data , 2003, Artificial Intelligence Review.

[98]  Yuh-Jyh Hu,et al.  Combinatorial motif analysis and hypothesis generation on a genomic scale , 2000, Bioinform..

[99]  S. Pal,et al.  Foundations of Soft Case-Based Reasoning: Pal/Soft Case-Based Reasoning , 2004 .

[100]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[101]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[102]  Ujjwal Maulik,et al.  Incorporating Chromosome Differentaition in Genetic Algorithms , 1998, Inf. Sci..

[103]  Ujjwal Maulik,et al.  Fuzzy partitioning using a real-coded variable-length genetic algorithm for pixel classification , 2003, IEEE Trans. Geosci. Remote. Sens..

[104]  Marley M. B. R. Vellasco,et al.  Rule-Evolver: An Evolutionary Approach for Data Mining , 1999, RSFDGrC.

[105]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[106]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[107]  Kun Liu,et al.  VEDAS: A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring , 2004, SDM.

[108]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[109]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1998, SODA '98.

[110]  Ujjwal Maulik,et al.  An evolutionary technique based on K-Means algorithm for optimal clustering in RN , 2002, Inf. Sci..

[111]  Hans-Paul Schwefel,et al.  Numerical Optimization of Computer Models , 1982 .

[112]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[113]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[114]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[115]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[116]  Sanghamitra Bandyopadhyay,et al.  Theoretical performance of genetic pattern classifier , 1999 .

[117]  Carlo Zaniolo,et al.  Metaqueries for Data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[118]  Ah Chung Tsoi,et al.  Noisy Time Series Prediction using Recurrent Neural Networks and Grammatical Inference , 2001, Machine Learning.

[119]  S. Pal,et al.  Bioinformatics in neurocomputing framework , 2005 .

[120]  Sourav S. Bhowmick,et al.  Research Issues in Web Data Mining , 1999, DaWaK.

[121]  W. H. Inmon,et al.  The data warehouse and data mining , 1996, CACM.

[122]  Ujjwal Maulik,et al.  Efficient prototype reordering in nearest neighbor classification , 2002, Pattern Recognit..

[123]  Yoh-Han Pao,et al.  Adaptive pattern recognition and neural networks , 1989 .