Data Mining: An Overview from a Database Perspective

Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information-providing services, such as data warehousing and online services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided and to increase business opportunities. In response to such a demand, this article provides a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.

[1]  Jiawei Han Knowledge Discovery in Object-Oriented and Active Databases , 1993 .

[2]  Hsinchun Chen,et al.  Browsing in hypertext: a cognitive study , 1992, IEEE Trans. Syst. Man Cybern..

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Richard T. Snodgrass,et al.  Bibliography on spatiotemporal databases , 1993, SGMD.

[5]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[6]  C. Faloutsos Eecient Similarity Search in Sequence Databases , 1993 .

[7]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[8]  Tomasz Imielinski,et al.  DataMine: Application Programming Interface and Query Language for Database Mining , 1996, KDD.

[9]  David Malah,et al.  Dynamic time warping with path control and non-local cost , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 2 - Conference B: Computer Vision & Image Processing. (Cat. No.94CH3440-5).

[10]  Philip S. Yu,et al.  HierarchyScan: a hierarchical similarity search algorithm for databases of long sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[11]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[12]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13]  Jennifer Widom,et al.  View maintenance in a warehousing environment , 1995, SIGMOD '95.

[14]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[15]  C. J. V. Rijsbergen,et al.  Rough Sets, Fuzzy Sets and Knowledge Discovery , 1994, Workshops in Computing.

[16]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[17]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[18]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[19]  Jiawei Han,et al.  DBMiner: A System for Mining Knowledge in Large Relational Databases , 1996, KDD.

[20]  Daniel E. O'Leary,et al.  Knowledge Discovery as a Threat to Database Security , 1991, Knowledge Discovery in Databases.

[21]  Carlo Zaniolo,et al.  Metaqueries for Data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[22]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[23]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[24]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[25]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[26]  Ramakrishnan Srikant,et al.  The Quest Data Mining System , 1996, KDD.

[27]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[28]  Jiawei Han,et al.  Mining knowledge at multiple concept levels , 1995, CIKM '95.

[29]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[30]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[31]  Jiawei Han,et al.  Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases , 1994, KDD Workshop.

[32]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[33]  Jiawei Han,et al.  Meta-Rule-Guided Mining of Association Rules in Relational Databases , 1995, KDOOD/TDOOD.

[34]  Salvatore J. Stolfo,et al.  Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning , 1995, KDD.

[35]  R. Ng,et al.  Eecient and Eeective Clustering Methods for Spatial Data Mining , 1994 .

[36]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[37]  Richard T. Snodgrass,et al.  Bibliography on spatiotemporal databases , 1993, SGMD.

[38]  Divesh Srivastava,et al.  IDEA: interactive data exploration and analysis , 1996, SIGMOD '96.

[39]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[40]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[41]  Divesh Srivastava,et al.  A visual language for interactive data exploration and analysis , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[42]  Vasant Dhar,et al.  Abstract-Driven Pattern Discovery in Databases , 1992, IEEE Trans. Knowl. Data Eng..

[43]  Douglas Fisher Optimization and Simplification of Hierarchical Clusterings , 1995, KDD.

[44]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[45]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[46]  Michael Bieber,et al.  Backtracking in a multiple-window hypertext environment , 1994, ECHT '94.

[47]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[48]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[49]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[50]  Jiawei Han,et al.  Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[51]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[52]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[53]  Gregory Piatetsky-Shapiro,et al.  Selecting and reporting What Is Interesting , 1996, Advances in Knowledge Discovery and Data Mining.

[54]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[55]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[56]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[57]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[58]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[59]  Won Kim,et al.  Introduction to Object-Oriented Databases , 1991, Computer systems.

[60]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[61]  Michael Stonebraker,et al.  Database research: achievements and opportunities into the 1st century , 1996, SGMD.

[62]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[63]  T. J. Watson,et al.  E cient Parallel Data Mining for Association RulesJong , 1995 .

[64]  RÓ ÚiÎT Knowledge Discovery in Object-Oriented and Active Databases , .

[65]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[66]  Shamkant B. Navathe,et al.  Knowledge mining by imprecise querying: a classification-based approach , 1992, [1992] Eighth International Conference on Data Engineering.

[67]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[68]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[69]  John December,et al.  World Wide Web Unleashed , 1994 .

[70]  Michael D. Soo,et al.  Bibliography on temporal databases , 1991, SGMD.

[71]  Ming-Syan Chen,et al.  Using multi-attribute predicates for mining classification rules , 1998, Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241).

[72]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[73]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[74]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[75]  Benjamin W. Wah,et al.  Editorial: Two Named to Editorial Board of IEEE Transactions on Knowledge and Data Engineering , 1996 .

[76]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[77]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[78]  R. Bone Discovery , 1938, Nature.

[79]  Hongjun Lu,et al.  NeuroRule: A Connectionist Approach to Data Mining , 1995, VLDB.

[80]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[81]  Philip S. Yu,et al.  Mining association rules with adjustable accuracy , 1997, CIKM '97.

[82]  Chris Clifton,et al.  SECURITY AND PRIVACY IMPLICATIONS OF DATA MINING , 1996 .

[83]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[84]  Hans-Peter Kriegel,et al.  Supporting data mining of large databases by visual feedback queries , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[85]  Abraham Silberschatz,et al.  On Subjective Measures of Interestingness in Knowledge Discovery , 1995, KDD.

[86]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[87]  Jiawei Han,et al.  Exploration of the power of attribute-oriented induction in data mining , 1995, KDD 1995.

[88]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[89]  Daryl Pregibon,et al.  A Statistical Perspective on Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[90]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[91]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.