A Two-step Bayes method for Spatial Drift in Consumer Classification

In the field of data mining, spatial drift refers to the data used to develop the model consists of only one part of the population, and the differences among the samples or between sample and population are unknown. This paper proposes a two-step Bayes method to improve adaptability for different region samples, which also maintains high model accuracy. The new method first groups region based on similarity, second, sets a model structure without parameters for populations or large samples with good data quality, and then trains parameters using samples in same region group. This method builds a estimation model, proving the method by showing how it can to some extent solve the uncertainty of consumer classification

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Ming Xu,et al.  Novel Three-Phase Clustering based on Support Vector Technique , 2013, J. Softw..

[3]  Li Yi-jun,et al.  Customer Sample Difference-oriented Bayes Segmentation Algorithm , 2006, 2006 International Conference on Management Science and Engineering.

[4]  J. C. Schlimmer,et al.  Incremental learning from noisy data , 2004, Machine Learning.

[5]  Toshinori Munakata,et al.  Knowledge discovery , 1999, Commun. ACM.

[6]  Markus Grünwald,et al.  Business Intelligence , 2009, Informatik-Spektrum.

[7]  Melody Y. Kiang,et al.  A two-stage clustering approach for multi-region segmentation , 2010, Expert Syst. Appl..

[8]  Tammo H. A. Bijmolt,et al.  Country and Consumer Segmentation: Multi-Level Latent Class Analysis of Financial Product Ownership , 2004 .

[9]  John A. Quelch,et al.  Global Marketing Management , 1999 .

[10]  Gerhard Widmer,et al.  Learning Flexible Concepts from Streams of Examples: FLORA 2 , 1992, ECAI.

[11]  Ingrid Renz,et al.  Adaptive Information Filtering : Learning Drifting Concepts , 1998 .

[12]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[13]  Giuseppe Arbia,et al.  Anisotropic spatial sampling designs for urban pollution , 2002 .

[14]  D. Hand,et al.  A k-nearest-neighbour classifier for assessing consumer credit risk , 1996 .

[15]  Frenkel Ter,et al.  Identifying Spatial Segments in International Markets , 2002 .

[16]  Ralf Klinkenberg Meta-Learning, Model Selection, and Example Selection in Machine Learning Domains with Concept Drift , 2005, LWA.

[17]  Ralf Klinkenberg Learning Drifting Concepts with Partial User Feedback , 1999 .

[18]  K. Juang,et al.  Using sequential indicator simulation to assess the uncertainty of delineating heavy-metal contaminated soils. , 2004, Environmental pollution.

[19]  Ming Xu,et al.  Vector-Distance and Neighborhood Development for High Dimensional Data , 2012, J. Softw..

[20]  Efraim Turban,et al.  Business Intelligence: Second European Summer School, eBISS 2012, Brussels, Belgium, July 15-21, 2012, Tutorial Lectures , 2013 .

[21]  LI Jian-ping Credit Scoring via Principal Component Analysis Linear-weighted Comprehensive Assessment and Application , 2004 .

[22]  Shengyi Jiang,et al.  A Splitting Criteria Based on Similarity in Decision Tree Learning , 2012, J. Softw..

[23]  Stefan Rüping,et al.  Concept Drift and the Importance of Example , 2003, Text Mining.

[24]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[25]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.