Adapting a Multi-SOM Clustering Algorithm to Large Banking Data

It the recent years, Big Data (BD) has attracted researchers in many domains as a new concept providing opportunities to improve research applications including business, science, engineering. Big Data Analytics is becoming a practice that many researchers adopt to construct valuable information from BD. This paper presents the BD technologies and how BD is useful in Cluster Analysis. Then, a clustering approach named multi-SOM is studied. In doing so, a banking dataset is analyzed integrating R statistical tool with BD technologies that include Hadoop Distributed File System, HBase and Map Reduce. Hence, we aim to decrease the time execution of multi-SOM clustering method in determining the number of clusters using R and Hadoop. Results show the performance of integrating R and Hadoop to handle big data using multi-SOM clustering algorithm and to overcome the weaknesses of R.

[1]  Prem Prakash Jayaraman,et al.  Big Data Reduction Methods: A Survey , 2016, Data Science and Engineering.

[2]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[3]  Malika Charrad,et al.  A Comparative Study of Multi-SOM Algorithms for Determining the Optimal Number of Clusters , 2015 .

[4]  Shing I. Chang,et al.  Determination of cluster number in clustering microarray data , 2005, Appl. Math. Comput..

[5]  Pradeep Kumar Ray,et al.  Investigating an ontology-based approach for Big Data analysis of inter-dependent medical and oral health conditions , 2014, Cluster Computing.

[6]  Kwan-Liu Ma,et al.  Visualizing Flow of Uncertainty through Analytical Processes , 2012, IEEE Transactions on Visualization and Computer Graphics.

[7]  Zhenlong Li,et al.  Big Data and cloud computing: innovation opportunities and challenges , 2017, Int. J. Digit. Earth.

[8]  Jean-Charles Lamirel,et al.  Using artificial neural networks for mapping of scienceand technology: A multi-self-organizing-maps approach , 2001, Scientometrics.

[9]  Mohamed Limam,et al.  An Improved Multi-SOM Algorithm for Determining the Optimal Number of Clusters , 2016 .

[10]  S. Abdelhak,et al.  Application of Multi-SOM clustering approach to macrophage gene expression analysis. , 2009, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[11]  Zaheer Khan,et al.  Big data text analytics: an enabler of knowledge management , 2017, J. Knowl. Manag..

[12]  B. Duhon,et al.  It's all in our heads , 1998 .

[13]  GandomiAmir,et al.  Beyond the hype , 2015 .

[14]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[15]  Francisco Herrera,et al.  Big data preprocessing: methods and prospects , 2016 .

[16]  C. M. Sheela Rani,et al.  A Survey on Clustering Techniques for Big Data Mining , 2016 .

[17]  Aidong Zhang,et al.  WaveCluster: a wavelet-based clustering approach for spatial data in very large databases , 2000, The VLDB Journal.

[18]  Joseph O. Chan Big Data Customer Knowledge Management , 2014 .

[19]  Ribana Roscher,et al.  Statistical Inference, Learning and Models in Big Data , 2015, ArXiv.

[20]  Cees T. A. M. de Laat,et al.  Addressing big data issues in Scientific Data Infrastructure , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[21]  J. Tukey The Future of Data Analysis , 1962 .

[22]  Z. Irani,et al.  Critical analysis of Big Data challenges and analytical methods , 2017 .

[23]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[24]  Yan Huang,et al.  Management and application of mobile big data , 2015, Int. J. Embed. Syst..

[25]  J.-C. Lamirel,et al.  MultiSOM: a multimap extension of the SOM model. Application to information discovery in an iconographic context , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).