Clustering with Apache Hadoop

The self-organizing map (SOM) is an unsupervised neural network which projects high-dimensional data onto a low-dimensional grid and visually reveals the topological order of the original data. Thus, SOM is an excellent tool in the exploratory phase of data mining. Self-organizing maps have been successfully applied to many fields, including engineering and business domains. Experimental results on census database illustrate the results of clustering. The paper proposes to improve the performance of clustering by the latest approach of cloud computing. The approach focuses on Hadoop that provides a Java-based software framework to distribute processing over a cluster of processors by providing a open source implementation of MapReduce, a powerful tool designed for the detailed analysis and transformation of very large data sets.

[1]  D. Chen,et al.  Breast cancer diagnosis using self-organizing map for sonography. , 2000, Ultrasound in medicine & biology.

[2]  Samuel Kaski,et al.  Bankruptcy analysis with self-organizing maps in learning metrics , 2001, IEEE Trans. Neural Networks.

[3]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[4]  Esa Alhoniemi,et al.  Self-Organizing Map for Data Mining in MATLAB: The SOM Toolbox , 1999 .

[5]  Chung-Chian Hsu,et al.  Generalizing self-organizing map for categorical data , 2006, IEEE Transactions on Neural Networks.

[6]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[7]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[8]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..