Customer Segmentation Using Clustering and Data Mining Techniques

—Clustering technique is critically important step in data mining process. It is a multivariate procedure quite suitable for segmentation applications in the market forecasting and planning research. This research paper is a comprehensive report of k-means clustering technique and SPSS Tool to develop a real time and online system for a particular super market to predict sales in various annual seasonal cycles. The model developed was an intelligent tool which received inputs directly from sales data records and automatically updated segmentation statistics at the end of day's business. The model was successfully implemented and tested over a period of three months. A total of n = 2138, customer, were tested for observations which were then divided into k = 4 similar groups. The classification was based on nearest mean. An ANOVA analysis was also carried out to test the stability of the clusters. The actual day to day sales statistics were compared with predicted statistics by the model. Results were quite encouraging and had shown high accuracy.

[1]  Sanjoy Dasgupta,et al.  Random projection trees for vector quantization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[2]  Jun Huang,et al.  An approach for improving K-means algorithm on market segmentation , 2010, 2010 International Conference on System Science and Engineering.

[3]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[4]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  M. Inaba Application of weighted Voronoi diagrams and randomization to variance-based k-clustering , 1994, SoCG 1994.

[6]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[7]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[8]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[9]  G. Chakraborty,et al.  Comparison of Probabilistic-D and k-Means Clustering in Segment Profiles for B2B Markets , 2011 .

[10]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[11]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[12]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[13]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[14]  Harald Hruschka,et al.  Comparing performance of feedforward neural nets and K-means for cluster-based market segmentation , 1999, Eur. J. Oper. Res..

[15]  Meena Mahajan,et al.  The Planar k-means Problem is NP-hard I , 2009 .

[16]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .