Exploratory Multilevel Hot Spot Analysis: Australian Taxation Office Case Study

Population based real-life datasets often contain smaller clusters of unusual sub-populations. While these clusters, called 'hot spots', are small and sparse, they are usually of special interest to an analyst. In this paper we introduce a visual drill-down Self-Organizing Map (SOM)-based approach to explore such hot spots characteristics in real-life datasets. Iterative clustering algorithms (such as k-means) and SOM are not designed to show these small and sparse clusters in detail. The feasibility of our approach is demonstrated using a large real life dataset from the Australian Taxation Office.

[1]  Denny,et al.  Visualization of Cluster Changes by Comparing Self-organizing Maps , 2005, PAKDD.

[2]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[3]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[4]  Samuel Kaski,et al.  Tips for Processing and Color-coding of Self-Organizing Maps , 1998 .

[5]  W. Wells Psychographics: A Critical Review , 1975 .

[6]  Michael J. Rothman,et al.  Applying Data Mining Techniques to a Health Insurance Information System , 1996, VLDB.

[7]  Graham J. Williams Evolutionary Hot Spots Data Mining - An Architecture for Exploring for Interesting Discoveries , 1999, PAKDD.

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques, Second Edition , 2006, The Morgan Kaufmann series in data management systems.

[9]  H. Fawcett Manual of Political Economy , 1995 .

[10]  Elias Pampalk,et al.  Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps , 2002, ICANN.

[11]  Jian Jhen Chen,et al.  K-means clustering versus validation measures: a data-distribution perspective. , 2009, IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society.

[12]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[13]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[14]  Pasi Koikkalainen,et al.  Self-organizing hierarchical feature maps , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[15]  Johan Himberg,et al.  Enhancing SOM-based data visualization by linking different data projections , 1998 .

[16]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[17]  Esa Alhoniemi,et al.  SOM Toolbox for Matlab 5 , 2000 .

[18]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[19]  Samuel Kaski,et al.  Visualizing the Clusters on the Self-Organizing Map , 1994 .

[20]  鳥居 泰彦,et al.  世界経済・社会統計 = World development indicators , 1998 .

[21]  Hui Xiong,et al.  K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[22]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[23]  Joseph Y. Lo,et al.  Self-organizing map for cluster analysis of a breast cancer database , 2003, Artif. Intell. Medicine.

[24]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .