论文信息 - Exploring data mining implementation

Exploring data mining implementation

IMPLEMENTATION To uncover relationships in data, statistical techniques such as factor analysis have been used in the past. Though traditional statistical techniques continue to be useful and effective for problems involving small data sets and a manageable number of variables, they run into a scalability roadblock when applied to problems where millions of records and thousands of variables exist. Data mining is thus emerging as a class of analytical techniques that go beyond statistics and aim at examining large quantities of data. What is important to keep in mind is the problems associated with data mining are fundamentally statistical in nature; that is, to infer patterns or models from data. In essence, data mining represents an umbrella or catch-all for a wide variety of techniques that aim at examining large quantities of data in search of easy-to-overlook relationships or hints that prove to have business or scientific value. A practical and applied definition of data mining is: the analysis and non-trivial extraction of data from databases for the purpose of discovering new and valuable information, in the form of patterns and rules, from relationships between data elements. Data mining is receiving widespread attention in the academic and public press literature [5] and case studies and anecdotal evidence [9] suggest companies are increasingly investigating the potential of data mining technology to deliver competitive advantage. It appears interest in data mining is not waning and that at a minimum, its use in the current application areas such as direct target marketing campaigns, fraud detection, and development of models to aid in financial predictions will only intensify. According to the Palo Alto Management Group, the data mining segment is one of the fastest growing in the entire Business Intelligence market. As a multidisciplinary field, data mining draws from areas such as artificial intelligence, database theory , data visualization, marketing, mathematics, operations research, pattern recognition, and statistics. Research into data mining has thus far focused on developing new algorithms [1] and on identifying future application areas [6]. Though both research into data mining technology and future application areas are important, the fundamental question in the minds of many early adopters is how to perform data mining. It is likely this question will take on greater importance as data mining becomes viewed as an Knowledge is the only factor of production that is not subject to diminishing returns. This inalienable truth is especially important in …

Karim K. Hirji

[1] Philip S. Yu,et al. Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[2] Izak Benbasat,et al. The Case Research Strategy in Studies of Information Systems , 1987, MIS Q..

[3] Keinosuke Fukunaga,et al. Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[4] David Haussler,et al. Mining scientific data , 1996, CACM.

[5] Heikki Mannila,et al. Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[6] Karim K. Hirji,et al. Discovering data mining: from concept to implementation , 1999, SKDD.

[7] Jiawei Han,et al. Data-Driven Discovery of Quantitative Rules in Relational Databases , 1993, IEEE Trans. Knowl. Data Eng..

[8] Vasudha Bhatnagar,et al. On Mining of Data , 2001 .