Data Mining for Direct Marketing : Problems and

Direct marketing is a process of identifying likely buyers of certain products and promoting the products accordingly. It is increasingly used by banks, insurance companies, and the retail industry. Data mining can provide an effective tool for direct marketing. During data mining, several specific problems arise. For example, the class distribution is extremely imbalanced (the response rate is about 1~), the predictive accuracy is no longer suitable for evaluating learning methods, and the number of examples can be too large. In this paper, we discuss methods of coping with these problems based on our experience on direct-marketing projects using data mining.

[1]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[2]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[3]  Victor Ciesielski,et al.  Using a Hybrid Neural/Expert System for Data Base Mining in Market Survey Data , 1996, KDD.

[4]  Tom Fawcett,et al.  Combining Data Mining and Machine Learning for Effective User Profiling , 1996, KDD.

[5]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[6]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[7]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[8]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[9]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[10]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[11]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[12]  A. S. Schistad Solberg,et al.  A large-scale evaluation of features for automatic detection of oil spills in ERS SAR images , 1996, IGARSS '96. 1996 International Geoscience and Remote Sensing Symposium.

[13]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[14]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[15]  Arthur Middleton Hughes The Complete Database Marketer: Second Generation Strategies and Techniques for Tapping the Power of Your Customer Database , 1995 .

[16]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[17]  Charles Elkan,et al.  Boosting and Naive Bayesian learning , 1997 .

[18]  Takao Terano,et al.  Interactive Knowledge Discovery from Marketing Questionnaire Using Simulated Breeding and Inductive Learning Methods , 1996, KDD.

[19]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.