Data Mining: Concepts and Techniques

The World Wide Web as a Global Information System has flooded us with a tremendous amount of data and information. Our capabilities of generating and collecting data have been increasing rapidly every day, in this age of Information Technology. This explosive growth in stored data has generated an urgent need for new technologies and automated tools to assist us in transforming the data into useful information and knowledge. Data Mining, popularly known as Knowledge Discovery in databases is the automated or convenient extraction of patterns representing knowledge implicitly stored in large databases which solves the above problem. This article explains What is data mining? and Why is it important? Also it deals about the basic concepts of data mining, data cluster and data mining rules. How to choose a data mining system with some examples also have been discussed in this article.

[1]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[2]  Philip S. Yu,et al.  Graph indexing: a frequent structure-based approach , 2004, SIGMOD '04.

[3]  Michael Stonebraker,et al.  Readings in Database Systems: Fourth Edition , 2005 .

[4]  Son K. Dao,et al.  Dealing with Semantic Heterogeneity by Generalization-Based Data Mining Techniques , 2007 .

[5]  Leonid Khachiyan,et al.  Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[6]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[7]  Ke Wang,et al.  Building Hierarchical Classifiers Using Class Proximity , 1999, VLDB.

[8]  W. H. Inmon,et al.  Building the data warehouse , 1992 .

[9]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[10]  Jon M. Kleinberg,et al.  A Microeconomic View of Data Mining , 1998, Data Mining and Knowledge Discovery.

[11]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[12]  Carlo Zaniolo,et al.  Metaqueries for Data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[13]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[14]  Sunita Sarawagi,et al.  i3: intelligent, interactive investigation of OLAP data cubes , 2000, SIGMOD '00.

[15]  ZhaoHui Tang,et al.  Building data mining solutions with OLE DB for DM and XML for analysis , 2005, SGMD.

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Y.-S. Shih,et al.  Families of splitting criteria for classification trees , 1999, Stat. Comput..

[18]  Prabhakar Raghavan,et al.  Information retrieval algorithms: a survey , 1997, SODA '97.

[19]  Mong-Li Lee,et al.  Image Mining: Trends and Developments , 2002, Journal of Intelligent Information Systems.

[20]  Joseph L. Hellerstein,et al.  Mining partially periodic event patterns with unknown periods , 2001, Proceedings 17th International Conference on Data Engineering.

[21]  Ada Wai-Chee Fu,et al.  Finding Structure and Characteristics of Web Documents for Classification , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[22]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[23]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[24]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[25]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[26]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[27]  Tom M. Mitchell,et al.  Version Spaces: A Candidate Elimination Approach to Rule Learning , 1977, IJCAI.

[28]  Jan-Ming Ho,et al.  Discovering informative content blocks from Web documents , 2002, KDD.

[29]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[30]  David M. Pennock,et al.  Statistical relational learning for document mining , 2003, Third IEEE International Conference on Data Mining.

[31]  Willi Klösgen,et al.  A Support System for Interpreting Statistical Data , 1991, Knowledge Discovery in Databases.

[32]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[33]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[34]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[35]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[36]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[37]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[38]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[39]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[40]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[41]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[42]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[43]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[44]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[45]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[46]  Christos Faloutsos,et al.  Prediction and indexing of moving objects with unknown motion patterns , 2004, SIGMOD '04.

[47]  Shamkant B. Navathe,et al.  Mining for strong negative associations in a large database of customer transactions , 1998, Proceedings 14th International Conference on Data Engineering.

[48]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[49]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[50]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[51]  B. Marx The Visual Display of Quantitative Information , 1985 .

[52]  John F. Roddick,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[53]  Kyuseok Shim,et al.  PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning , 1998, Data Mining and Knowledge Discovery.

[54]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[55]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[56]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[57]  J. Ross Quinlan,et al.  An Empirical Comparison of Genetic and Decision-Tree Classifiers , 1988, ML.

[58]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[59]  Thomas G. Dietterich,et al.  Readings in Machine Learning , 1991 .

[60]  V. S. Subrahmanian Principles of Multimedia Database Systems , 1998 .

[61]  Jiawei Han,et al.  Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases , 1994, KDD Workshop.

[62]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[63]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[64]  John F. Roddick,et al.  An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research , 2000, TSDM.

[65]  Douglas H. Fisher,et al.  A Case Study of Incremental Concept Induction , 1986, AAAI.

[66]  Philip S. Yu,et al.  Cross-relational clustering with user's guidance , 2005, KDD '05.

[67]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[68]  Jiawei Han,et al.  GeoMiner: a system prototype for spatial data mining , 1997, SIGMOD '97.

[69]  Philip S. Yu,et al.  Clustering through decision tree construction , 2000, CIKM '00.

[70]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[71]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[72]  Joseph Revelli,et al.  The Image Processing Handbook, 4th Edition , 2003, J. Electronic Imaging.

[73]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[74]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[75]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[76]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[77]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[78]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[79]  Philip S. Yu,et al.  Mining Asynchronous Periodic Patterns in Time Series Data , 2003, IEEE Trans. Knowl. Data Eng..

[80]  David Konopnicki,et al.  W3QS: A Query System for the World-Wide Web , 1995, VLDB.

[81]  George Karypis,et al.  C HAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling , 1999 .

[82]  Jesus Mena,et al.  Investigative Data Mining for Security and Criminal Detection , 2002 .

[83]  Anthony K. H. Tung,et al.  Spatial clustering in the presence of obstacles , 2001, Proceedings 17th International Conference on Data Engineering.

[84]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[85]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[86]  Ronald R. Yager,et al.  Fuzzy sets, neural networks, and soft computing , 1994 .

[87]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[88]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[89]  John F. Roddick,et al.  On the impact of knowledge discovery and data mining , 2000 .

[90]  Herbert A. Simon,et al.  Scientific discovery: compulalional explorations of the creative process , 1987 .

[91]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[92]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[93]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[94]  I. Bratko,et al.  Learning decision rules in noisy domains , 1987 .

[95]  Mathias Kirsten,et al.  Extending K-Means Clustering to First-Order Representations , 2000, ILP.

[96]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[97]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[98]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom , 1998 .

[99]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[100]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[101]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[102]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[103]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[104]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[105]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[106]  Qiang Yang,et al.  Plan Mining by Divide-and-Conquer , 1999, 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[107]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[108]  Arie Shoshani,et al.  OLAP and statistical databases: similarities and differences , 1997, PODS '97.

[109]  Stuart J. Russell,et al.  Local Learning in Probabilistic Networks with Hidden Variables , 1995, IJCAI.

[110]  M. A. Wincek Applied Statistical Time Series Analysis , 1990 .

[111]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[112]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[113]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[114]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[115]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[116]  Philip S. Yu,et al.  Substructure similarity search in graph databases , 2005, SIGMOD '05.

[117]  Andrzej Lenarcik,et al.  Probabilistic Rough Classifiers with Mixtures of Discrete and Continuous Attributes , 1997 .

[118]  Laks V. S. Lakshmanan,et al.  A declarative language for querying and restructuring the Web , 1996, Proceedings RIDE '96. Sixth International Workshop on Research Issues in Data Engineering.

[119]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[120]  Douglas B. Terry,et al.  Continuous queries over append-only databases , 1992, SIGMOD '92.

[121]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[122]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[123]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[124]  Jiawei Han,et al.  Selective Materialization: An Efficient Method for Spatial Data Cube Construction , 1998, PAKDD.

[125]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[126]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[127]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[128]  Wynne Hsu,et al.  Using General Impressions to Analyze Discovered Classification Rules , 1997, KDD.

[129]  Jiawei Han,et al.  Summarizing itemset patterns: a profile-based approach , 2005, KDD '05.

[130]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[131]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[132]  Jiawei Han,et al.  Mining Compressed Frequent-Pattern Sets , 2005, VLDB.

[133]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[134]  Jeffrey C. Schlimmer Learning and Representation Change , 1987, AAAI.

[135]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[136]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[137]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[138]  Kyuseok Shim,et al.  WALRUS: A Similarity Retrieval Algorithm for Image Databases , 2004, IEEE Trans. Knowl. Data Eng..

[139]  Jiawei Han,et al.  Resource and Knowledge Discovery in Global Information Systems: A Preliminary Design and Experiment , 1995, KDD.

[140]  John Scott Social Network Analysis , 1988 .

[141]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[142]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[143]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[144]  Bernard Widrow,et al.  Neural networks: applications in industry, business and science , 1994, CACM.

[145]  Jiong Yang,et al.  CLUSEQ: efficient and effective sequence clustering , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[146]  Ashish Gupta,et al.  Materialized views: techniques, implementations, and applications , 1999 .

[147]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[148]  Bernhard Schölkopf,et al.  Shrinking the Tube: A New Support Vector Regression Algorithm , 1998, NIPS.

[149]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[150]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[151]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[152]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[153]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[154]  Jiawei Han,et al.  Generalization and decision tree induction: efficient classification in data mining , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[155]  Howard J. Hamilton,et al.  Knowledge discovery and measures of interest , 2001 .

[156]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[157]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[158]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[159]  Xuehua Shen,et al.  Context-sensitive information retrieval using implicit feedback , 2005, SIGIR '05.

[160]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[161]  Stephen Muggleton,et al.  Efficient Induction of Logic Programs , 1990, ALT.

[162]  Jiawei Han,et al.  Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes , 2000, IEEE Trans. Knowl. Data Eng..

[163]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[164]  Jorma Rissanen,et al.  MDL-Based Decision Tree Pruning , 1995, KDD.

[165]  Ron Kohavi,et al.  Mining e-commerce data: the good, the bad, and the ugly , 2001, KDD '01.

[166]  Heikki Mannila,et al.  Theoretical frameworks for data mining , 2000, SKDD.

[167]  Christopher Dean,et al.  Quakefinder: A Scalable Data Mining System for Detecting Earthquakes from Space , 1996, KDD.

[168]  Mohammed J. Zaki,et al.  PlanMine: Sequence Mining for Plan Failures , 1998, KDD.

[169]  Thomas C. Redman,et al.  Data Quality: The Field Guide , 2001 .

[170]  R. Nakano,et al.  Medical diagnostic expert system based on PDP model , 1988, IEEE 1988 International Conference on Neural Networks.

[171]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[172]  David Loshin Enterprise knowledge management: the data quality approach , 2000 .

[173]  Ehud Gudes,et al.  Computing frequent graph patterns from semistructured data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[174]  Wei-Ying Ma,et al.  Locality preserving indexing for document representation , 2004, SIGIR '04.

[175]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[176]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[177]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[178]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[179]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[180]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[181]  Hongyan Liu,et al.  C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[182]  Allen T. Craig,et al.  Introduction to Mathematical Statistics (6th Edition) , 2005 .

[183]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[184]  Maurice Bruynooghe,et al.  Predictive data mining in intensive care , 2006 .

[185]  Jaideep Srivastava,et al.  Web Mining — Concepts, Applications, and Research Directions , 2004 .

[186]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[187]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.

[188]  Wai Lam,et al.  Bayesian Network Refinement Via Machine Learning Approach , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[189]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[190]  Ralf Hartmut Güting Dr.rer.nat An introduction to spatial database systems , 2005, The VLDB Journal.

[191]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[192]  George H. John Enhancements to the data mining process , 1997 .

[193]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[194]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[195]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[196]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[197]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[198]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[199]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[200]  Jack E. Olson,et al.  Data Quality: The Accuracy Dimension , 2003 .

[201]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[202]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[203]  Jiawei Han,et al.  Mining closed relational graphs with connectivity constraints , 2005, 21st International Conference on Data Engineering (ICDE'05).

[204]  Joan Feigenbaum,et al.  Factorization in Experiment Generation , 1986, AAAI.

[205]  Sridhar Ramaswamy,et al.  On the Discovery of Interesting Patterns in Association Rules , 1998, VLDB.

[206]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[207]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[208]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[209]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[210]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[211]  Kathryn B. Laskey,et al.  Network Fragments: Representing Knowledge for Constructing Probabilistic Models , 1997, UAI.

[212]  Donato Malerba,et al.  A Further Comparison of Simplification Methods for Decision-Tree Induction , 1995, AISTATS.

[213]  Michael S. Waterman,et al.  Introduction to Computational Biology: Maps, Sequences and Genomes , 1998 .

[214]  Dennis Shasha,et al.  High Performance Discovery In Time Series: Techniques And Case Studies (Monographs in Computer Science) , 2004 .

[215]  Tomasz Imielinski,et al.  MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[216]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.

[217]  Rob Mattison,et al.  Data Warehousing and Data Mining for Telecommunications , 1997 .

[218]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[219]  João Meidanis,et al.  Introduction to computational molecular biology , 1997 .

[220]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[221]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[222]  Matthew Richardson,et al.  Mining knowledge-sharing sites for viral marketing , 2002, KDD.

[223]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[224]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[225]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[226]  Hongjun Lu,et al.  NeuroRule: A Connectionist Approach to Data Mining , 1995, VLDB.

[227]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[228]  Dimitris Meretakis,et al.  Extending naïve Bayes classifiers using long itemsets , 1999, KDD '99.

[229]  Sergio A. Alvarez,et al.  Efficient Adaptive-Support Association Rule Mining for Recommender Systems , 2004, Data Mining and Knowledge Discovery.

[230]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[231]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[232]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[233]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[234]  Shashi Shekhar,et al.  Spatial Databases - Accomplishments and Research Needs , 1999, IEEE Trans. Knowl. Data Eng..

[235]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.

[236]  Yasuhiko Morimoto,et al.  Computing Optimized Rectilinear Regions for Association Rules , 1997, KDD.

[237]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[238]  Thomas C. Redman,et al.  Data Quality Management and Technology , 1992 .

[239]  Chris Clifton,et al.  Privacy-preserving k-means clustering over vertically partitioned data , 2003, KDD '03.

[240]  Clement T. Yu,et al.  Priniples of Database Query Processing for Advanced Applications , 1997 .

[241]  M. Pagano,et al.  Survival analysis. , 1996, Nutrition.

[242]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[243]  Wojciech Ziarko,et al.  The Discovery, Analysis, and Representation of Data Dependencies in Databases , 1991, Knowledge Discovery in Databases.

[244]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[245]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[246]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[247]  Jiawei Han,et al.  Mining recurrent items in multimedia with progressive resolution refinement , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[248]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[249]  Zhaohui Tang,et al.  Data Mining with SQL Server 2005 , 2005 .

[250]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[251]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[252]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[253]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[254]  Stuart J. Russell,et al.  BLOG: Probabilistic Models with Unknown Objects , 2005, IJCAI.

[255]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[256]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[257]  Agnès Voisard,et al.  Spatial Databases: With Application to GIS , 2001 .

[258]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[259]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[260]  Madhuri S. Mulekar Data Mining: Multimedia, Soft Computing, and Bioinformatics , 2004, Technometrics.

[261]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[262]  Phyllis Koton,et al.  Reasoning about Evidence in Causal Explanations , 1988, AAAI.

[263]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[264]  Jude W. Shavlik,et al.  Extracting Refined Rules from Knowledge-Based Neural Networks , 1993, Machine Learning.

[265]  George H. John Behind-the-scenes data mining: a report on the KDD-98 panel , 1999, SKDD.

[266]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[267]  Jiong Yang,et al.  SPIN: mining maximal frequent subgraphs from graph databases , 2004, KDD.

[268]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[269]  D. Krane,et al.  Fundamental Concepts of Bioinformatics , 2002 .

[270]  John A. Major,et al.  Selecting among rules induced from a hurricane database , 1993, Journal of Intelligent Information Systems.

[271]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[272]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1988, IJCAI 1989.

[273]  George Kollios,et al.  Mining, indexing, and querying historical spatiotemporal data , 2004, KDD.

[274]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[275]  Jiawei Han,et al.  Efficient Polygon Amalgamation Methods for Spatial OLAP and Spatial Data Mining , 1999, SSD.

[276]  Goetz Graefe,et al.  Multi-table joins through bitmapped join indices , 1995, SGMD.

[277]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[278]  Kenneth A. Ross,et al.  Complex Aggregation at Multiple Granularities , 1998, EDBT.

[279]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[280]  Michel Manago,et al.  Induction of Decision Trees from Complex Structured Data , 1991, Knowledge Discovery in Databases.

[281]  Duncan J. Watts,et al.  Six Degrees: The Science of a Connected Age , 2003 .

[282]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[283]  Jiawei Han,et al.  Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes , 1997, KDD.

[284]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[285]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[286]  Lawrence B. Holder,et al.  Knowledge discovery in molecular biology: Identifying structural regularities in proteins , 1999, Intell. Data Anal..

[287]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[288]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[289]  R. Higgins Analysis for Financial Management , 2004 .

[290]  Erik Thomsen,et al.  OLAP Solutions - Building Multidimensional Information Systems , 1997 .

[291]  Barbara Hubbard,et al.  The World According to Wavelets , 1996 .

[292]  Daniel S. Hirschberg,et al.  The Time Complexity of Decision Tree Induction , 1995 .

[293]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[294]  N. Mati,et al.  Discovering Informative Patterns and Data Cleaning , 1996 .

[295]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[296]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[297]  Dimitrios Gunopulos,et al.  On-Line Discovery of Dense Areas in Spatio-temporal Databases , 2003, SSTD.

[298]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[299]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[300]  Kristian G. Olesen,et al.  Practical Issues in Modeling Large Diagnostic Systems with Multiply Sectioned Bayesian Networks , 2000, Int. J. Pattern Recognit. Artif. Intell..

[301]  Tariq Samad,et al.  Designing Application-Specific Neural Networks Using the Genetic Algorithm , 1989, NIPS.

[302]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[303]  J. Snoeyink,et al.  Mining Spatial Motifs from Protein Structure Graphs , 2003 .

[304]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[305]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[306]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[307]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[308]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[309]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[310]  Dimitrios Gunopulos,et al.  Efficient Mining of Spatiotemporal Patterns , 2001, SSTD.

[311]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[312]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[313]  Jiawei Han,et al.  Towards on-line analytical mining in large databases , 1998, SGMD.

[314]  Christos Faloutsos,et al.  Advanced Database Systems , 1997, Lecture Notes in Computer Science.

[315]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[316]  I. Kononenko,et al.  Attribute Selection for Modeling , 1997 .

[317]  Padhraic Smyth,et al.  An Information Theoretic Approach to Rule Induction from Databases , 1992, IEEE Trans. Knowl. Data Eng..

[318]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[319]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[320]  Andrew W. Moore,et al.  Tractable group detection on large link data sets , 2003, Third IEEE International Conference on Data Mining.

[321]  Jiawei Han,et al.  Generalization-Based Data Mining in Object-Oriented Databases Using an Object Cube Model , 1998, Data Knowl. Eng..

[322]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[323]  Lotfi A. Zadeh,et al.  Commonsense Knowledge Representation Based on Fuzzy Logic , 1983, Computer.

[324]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[325]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[326]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[327]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[328]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[329]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[330]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[331]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[332]  W. Scott Spangler,et al.  Learning Useful Rules from Inconclusive Data , 1991, Knowledge Discovery in Databases.

[333]  Raúl E. Valdés-Pérez,et al.  Principles of Human Computer Collaboration for Knowledge Discovery in Science , 1999, Artif. Intell..

[334]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[335]  Jiawei Han,et al.  Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[336]  Hongjun Lu,et al.  On computing, storing and querying frequent patterns , 2003, KDD '03.

[337]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[338]  Jiawei Han,et al.  MultiMediaMiner: a system prototype for multimedia data mining , 1998, SIGMOD '98.

[339]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[340]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[341]  Mathias Kirsten,et al.  Relational Distance-Based Clustering , 1998, ILP.

[342]  James L. McClelland Parallel Distributed Processing , 2005 .

[343]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[344]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[345]  S. Muthukrishnan,et al.  Mining Deviants in a Time Series Database , 1999, VLDB.

[346]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[347]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[348]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[349]  Ke Wang,et al.  Mining frequent item sets by opportunistic projection , 2002, KDD.

[350]  Jiawei Han,et al.  MM-Cubing: computing Iceberg cubes by factorizing the lattice space , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[351]  J A Swets,et al.  Measuring the accuracy of diagnostic systems. , 1988, Science.

[352]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[353]  Avi Pfeffer,et al.  SPOOK: A system for probabilistic object-oriented knowledge representation , 1999, UAI.

[354]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML.

[355]  Jiawei Han,et al.  CoMine: efficient mining of correlated patterns , 2003, Third IEEE International Conference on Data Mining.

[356]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[357]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[358]  Stephen Northcutt,et al.  Network intrusion detection , 2003 .

[359]  Ryszard S. Michalski,et al.  AQ15: Incremental Learning of Attribute-Based Descriptions from Examples: The Method and User's Guide , 1986 .

[360]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[361]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[362]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[363]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[364]  Kevin D. Ashley,et al.  A case-based system for trade secrets law , 1987, ICAIL '87.

[365]  Benjamin Van Roy,et al.  Solving Data Mining Problems Through Pattern Recognition , 1997 .

[366]  David T. Jones,et al.  Bioinformatics: Genes, Proteins and Computers , 2007 .

[367]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[368]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[369]  Giulia Pagallo,et al.  Learning DNF by Decision Trees , 1989, IJCAI.

[370]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[371]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[372]  C. J. Huberty,et al.  Applied Discriminant Analysis , 1994 .

[373]  Edward R. Tufte,et al.  Envisioning Information , 1990 .

[374]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[375]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[376]  Anthony K. H. Tung,et al.  Constraint-based clustering in large databases , 2001, ICDT.

[377]  Bart Goethals,et al.  FIMI'03: Workshop on Frequent Itemset Mining Implementations , 2003 .

[378]  Stephen Jose Hanson,et al.  Minkowski-r Back-Propagation: Learning in Connectionist Models with Non-Euclidian Error Signals , 1987, NIPS.

[379]  Christopher K. Riesbeck,et al.  Inside Case-Based Reasoning , 1989 .

[380]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[381]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[382]  Mohammed J. Zaki Efficient enumeration of frequent sequences , 1998, CIKM '98.

[383]  Raymond T. Ng,et al.  Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining , 1996, IEEE Trans. Knowl. Data Eng..

[384]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[385]  Saul Greenberg,et al.  How people revisit web pages: empirical findings and implications for the design of history systems , 1997, Int. J. Hum. Comput. Stud..

[386]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[387]  Madhu Sudan,et al.  A statistical perspective on data mining , 1997, Future Gener. Comput. Syst..

[388]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[389]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[390]  Sung-Hyon Myaeng,et al.  A practical hypertext catergorization method using links and incrementally available class information , 2000, SIGIR '00.

[391]  Heikki Mannila,et al.  The power of sampling in knowledge discovery , 1994, PODS '94.

[392]  Cheng Yang,et al.  Efficient discovery of error-tolerant frequent itemsets in high dimensions , 2001, KDD '01.

[393]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[394]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[395]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize in a Data Warehouse , 2005, IEEE Trans. Knowl. Data Eng..

[396]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[397]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[398]  Oren Etzioni,et al.  Adaptive Web Sites: Conceptual Cluster Mining , 1999, IJCAI.

[399]  Jon M. Kleinberg,et al.  Applications of linear algebra in information retrieval and hypertext analysis , 1999, PODS '99.

[400]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[401]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[402]  Paul S. Bradley,et al.  Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[403]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[404]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[405]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[406]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[407]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[408]  F. Ramsey,et al.  The statistical sleuth : a course in methods of data analysis , 2002 .

[409]  D. Watts,et al.  Small Worlds: The Dynamics of Networks between Order and Randomness , 2001 .

[410]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[411]  H. Margolis Visual explanations: Images and quantities, evidence and narrative , 1998 .

[412]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[413]  Jiawei Han,et al.  An Efficient Two-Step Method for Classification of Spatial Data , 1998 .

[414]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[415]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[416]  Sunita Sarawagi,et al.  Intelligent Rollups in Multidimensional OLAP Data , 2001, VLDB.