Data mining from 1994 to 2004: an application-orientated review

Data mining, which is also known as knowledge discovery, is one of the most popular topics in information technology. It concerns the process of automatically extracting useful information and has the promise of discovering hidden relationships that exist in large databases. These relationships represent valuable knowledge that is crucial for many applications. This paper presents a review of works on current applications of data mining, which focus on four main application areas, including bioinformatics data, information retrieval, adaptive hypermedia and electronic commerce. How data mining can enhance functions for these four areas is described. The reader of this paper is expected to get an overview of the state-of-the-art research associated with these applications. Furthermore, we identify the limitations of current works and raise several directions for future research.

[1]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2002, Computer.

[2]  Ray Tsaih,et al.  Forecasting S&P 500 stock index futures with a hybrid AI system , 1998, Decis. Support Syst..

[3]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[4]  Bradley N. Miller,et al.  MovieLens unplugged: experiences with an occasionally connected recommender system , 2003, IUI '03.

[5]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[6]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[7]  Oren Etzioni,et al.  Towards adaptive Web sites: Conceptual framework and case study , 2000, Artif. Intell..

[8]  Huiqing Liu,et al.  Simple rules underlying gene expression profiles of more than six subtypes of acute lymphoblastic leukemia (ALL) patients , 2003, Bioinform..

[9]  Diego Kuonen,et al.  Challenges in Bioinformatics for Statistical Data Miners , 2003 .

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[12]  S. C. Hui,et al.  Mining a Web Citation Database for author co-citation analysis , 2002, Inf. Process. Manag..

[13]  Ramasamy Uthurusamy,et al.  Data mining and knowledge discovery in databases , 1996, CACM.

[14]  Jan Freudenberg,et al.  A similarity-based method for genome-wide prediction of disease-relevant human genes , 2002, ECCB.

[15]  Xindong Wu,et al.  Mining Both Positive and Negative Association Rules , 2002, ICML.

[16]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[17]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[18]  Tzu-Chuen Lu,et al.  Mining association rules procedure to support on-line recommendation by customers and products fragmentation , 2001, Expert Syst. Appl..

[19]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[20]  See-Kiong Ng,et al.  On combining multiple microarray studies for improved functional classification by whole-dataset feature selection. , 2003, Genome informatics. International Conference on Genome Informatics.

[21]  Monica Lam,et al.  Neural network techniques for financial performance prediction: integrating fundamental and technical analysis , 2004, Decis. Support Syst..

[22]  Geoffrey I. Webb Discovering associations with numeric variables , 2001, KDD '01.

[23]  G. Magoulas,et al.  Improved processing of microarray data using image reconstruction techniques , 2003, IEEE Transactions on NanoBioscience.

[24]  Shamkant B. Navathe,et al.  Mining for strong negative associations in a large database of customer transactions , 1998, Proceedings 14th International Conference on Data Engineering.

[25]  Duen-Ren Liu,et al.  Integrating AHP and data mining for product recommendation based on customer lifetime value , 2005, Inf. Manag..

[26]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[27]  Ron Kohavi,et al.  MineSet: An Integrated System for Data Mining , 1997, KDD.

[28]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[29]  Ramesh Sharda,et al.  Bankruptcy prediction using neural networks , 1994, Decis. Support Syst..

[30]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[31]  Jaideep Srivastava,et al.  Automatic personalization based on Web usage mining , 2000, CACM.

[32]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[33]  Namsik Chang,et al.  Dynamics of Modeling in Data Mining: Interpretive Approach to Bankruptcy Prediction , 1999, J. Manag. Inf. Syst..

[34]  Michael L. Gargano,et al.  Data mining - a powerful information creating tool , 1999, OCLC Syst. Serv..

[35]  Dell Zhang,et al.  A novel Web usage mining approach for search engines , 2002, Comput. Networks.

[36]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[37]  Kei-Hoi Cheung,et al.  TRIPLES: a database of gene function in Saccharomyces cerevisiae , 2000, Nucleic Acids Res..

[38]  Tao Luo,et al.  Integrating Web Usage and Content Mining for More Effective Personalization , 2000, EC-Web.

[39]  Jan Komorowski,et al.  Predicting Gene Function from Gene Expressions and Ontologies , 2000, Pacific Symposium on Biocomputing.

[40]  Ray J. Paul,et al.  Visualizing a Knowledge Domain's Intellectual Structure , 2001, Computer.

[41]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[43]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[44]  Philip S. Yu Data mining and personalization technologies , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[45]  Yin Zhao,et al.  Mortgage data mining , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[46]  David Enke,et al.  The adaptive selection of financial and economic variables for use with artificial neural networks , 2004, Neurocomputing.

[47]  Joseph P. Bigus,et al.  Data mining with neural networks: solving business problems from application development to decision support , 1996 .

[48]  Xiaohui Liu,et al.  Mining gene expression data , 2003 .

[49]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[50]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD.

[51]  Steven H. Kim,et al.  Predictability of interest rates using data mining tools: A comparative analysis of Korea and the US , 1997 .

[52]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[53]  Arbee L. P. Chen,et al.  A music recommendation system based on music data grouping and user interests , 2001, CIKM '01.

[54]  Luis Mateus Rocha Adaptive recommendation and open‐ended semiosis , 2001 .

[55]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[56]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[57]  Kristina Höök,et al.  A recipe based on-line food store , 2000, IUI '00.

[58]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[59]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[60]  Sally Jo Cunningham,et al.  Market basket analysis of library circulation data , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[61]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[62]  Georgios Paliouras,et al.  Web Usage Mining as a Tool for Personalization: A Survey , 2003, User Modeling and User-Adapted Interaction.

[63]  Chi-Hoon Lee,et al.  Web personalization expert with combining collaborative filtering and association rule mining technique , 2001, Expert Syst. Appl..

[64]  Liliana Ardissono,et al.  Dynamic User Modeling in a Web Store Shell , 2000, ECAI.

[65]  Le Gruenwald,et al.  A survey of data mining and knowledge discovery software tools , 1999, SKDD.

[66]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[67]  Xindong Wu,et al.  SiteHelper: A Localized Agent That Helps Incremental Exploration of the World Wide Web , 1997, Comput. Networks.

[68]  Pierre Baldi,et al.  Bioinformatics - the machine learning approach (2. ed.) , 2000 .

[69]  Dmitrij Frishman,et al.  MIPS: a database for protein sequences and complete genomes , 1998, Nucleic Acids Res..

[70]  Yi-Fan Wang,et al.  A personalized recommender system for the cosmetic business , 2004, Expert Syst. Appl..

[71]  Kam-Fai Wong,et al.  Nstar: an interactive tool for local web search , 2003, Inf. Manag..

[72]  M. Gerstein,et al.  Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. , 2002, Genome research.

[73]  David Botstein,et al.  Variation in gene expression patterns in follicular lymphoma and the response to rituximab , 2003, Proceedings of the National Academy of Sciences of the United States of America.