Uncertainty Handling and Quality Assessment in Data Mining

Uncertainty Handling and Quality Assessment in Data Mining provides an introduction to the application of these concepts in Knowledge Discovery and Data Mining. It reviews the state-of-the-art in uncertainty handling and discusses a framework for unveiling and handling uncertainty. Coverage of quality assessment begins with an introduction to cluster analysis and a comparison of the methods and approaches that may be used. The techniques and algorithms involved in other essential data mining tasks, such as classification and extraction of association rules, are also discussed together with a review of the quality criteria and techniques for evaluating the data mining results. This book presents a general framework for assessing quality and handling uncertainty which is based on tested concepts and theories. This framework forms the basis of an implementation tool, 'Uminer' which is introduced to the reader for the first time. This tool supports the key data mining tasks while enhancing the traditional processes for handling uncertainty and assessing quality. Aimed at IT professionals involved with data mining and knowledge discovery, the work is supported with case studies from epidemiology and telecommunications that illustrate how the tool works in 'real world' data mining projects. The book would also be of interest to final year undergraduates or post-graduate students looking at: databases, algorithms, artificial intelligence and information systems particularly with regard to uncertainty and quality assessment.

[1]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[2]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[3]  Takahiko Horiuchi,et al.  Decision Rule for Pattern Classification by Integrating Interval Feature Values , 1998, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[5]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Michalis Vazirgiannis,et al.  A Data Set Oriented Approach for Clustering Algorithm Selection , 2001, PKDD.

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[10]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[11]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[12]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[13]  Michalis Vazirgiannis,et al.  Cluster validity methods: part I , 2002, SGMD.

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[16]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[17]  Allan Borodin,et al.  Subquadratic approximation algorithms for clustering problems in high dimensional spaces , 1999, STOC '99.

[18]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[19]  Yiyu Yao,et al.  Peculiarity Oriented Multi-database Mining , 1999, PKDD.

[20]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[21]  Michalis Vazirgiannis,et al.  UMiner: A Data Mining System Handling Uncertainty and Quality , 2002, EDBT.

[22]  A. Gyenesei,et al.  Determining Fuzzy Sets for Quantitative Attributes in Data Mining Problems , 2000 .

[23]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[24]  Stephen L. Chiu,et al.  Extracting Fuzzy Rules from Data for Function Approximation and Pattern Classification , 2000 .

[25]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[26]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[27]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[28]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[29]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[30]  Maria E. Orlowska,et al.  CCAIIA: Clustering Categorial Attributed into Interseting Accociation Rules , 1998, PAKDD.

[31]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[32]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[33]  Ambuj K. Singh,et al.  Variable length queries for time series data , 2001, Proceedings 17th International Conference on Data Engineering.

[34]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[35]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[36]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[38]  Ust Beijing,et al.  Data Mining and Knowledge Discovery in Databases , 1999 .

[39]  Nikhil R. Pal,et al.  Cluster validation using graph theoretic concepts , 1997, Pattern Recognit..

[40]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[41]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[42]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[43]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[44]  Dimitrios Gunopulos,et al.  An Adaptive Metric Machine for Pattern Classification , 2000, NIPS.

[45]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[46]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[47]  Subhash Sharma Applied multivariate techniques , 1995 .

[48]  Padhraic Smyth,et al.  Deformable Markov model templates for time-series pattern matching , 2000, KDD '00.

[49]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[50]  Myron Wish,et al.  Three-Way Multidimensional Scaling , 1978 .

[51]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[52]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[53]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[54]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[55]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[56]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[57]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[58]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[59]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[60]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[61]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[62]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[63]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[64]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[65]  Alberto O. Mendelzon,et al.  Similarity-based queries , 1995, PODS '95.

[66]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[67]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[68]  M. Vazirgiannis,et al.  Clustering validity assessment using multi representatives , 2002 .

[69]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[70]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[71]  Cezary Z. Janikow,et al.  Exemplar learning in fuzzy decision trees , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[72]  Michalis Vazirgiannis,et al.  Quality Scheme Assessment in the Clustering Process , 2000, PKDD.

[73]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[74]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[75]  Hichem Frigui,et al.  Quadratic shell clustering algorithms and the detection of second-degree curves , 1993 .

[76]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[77]  Yannis Manolopoulos,et al.  C2P: Clustering based on Closest Pairs , 2001, VLDB.

[78]  T. F. O'Brien,et al.  WHONET: an information system for monitoring antimicrobial resistance. , 1995, Emerging infectious diseases.

[79]  Jian Pei,et al.  Can we push more constraints into frequent pattern mining? , 2000, KDD '00.

[80]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[81]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[82]  Padhraic Smyth,et al.  Rule Induction Using Information Theory , 1991, Knowledge Discovery in Databases.

[83]  Gregory Piatetsky-Shapiro,et al.  Advances in Knowledge Discovery and Data Mining , 2004, Lecture Notes in Computer Science.

[84]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[85]  James C. Bezdek,et al.  On relational data versions of c-means algorithms , 1996, Pattern Recognit. Lett..

[86]  Rajesh N. Davé,et al.  Validating fuzzy partitions obtained through c-shells clustering , 1996, Pattern Recognit. Lett..

[87]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[88]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[89]  Piotr Indyk A sublinear time approximation scheme for clustering in metric spaces , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[90]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[91]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[92]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[93]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[94]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[95]  Michalis Vazirgiannis,et al.  Managing Uncertainty and Quality in the Classification Process , 2002, SETN.

[96]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[97]  Padhraic Smyth,et al.  Clustering Using Monte Carlo Cross-Validation , 1996, KDD.

[98]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[99]  Carlos Bento,et al.  A Metric for Selection of the Most Promising Rules , 1998, PKDD.

[100]  Attila Gyenesei Mining Weighted Association Rules for Fuzzy Quantitative Items , 2000, PKDD.

[101]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[102]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[103]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[104]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[105]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.