Efficient big data analysis on a single machine using apache spark and self-organizing map libraries

Apache Spark is commonly used as a big data analytical platform on powerful computer clusters, as it primarily employ the main computer memory for the evaluation. Our attempt adds self-organizing map software libraries onto a single big data analytical stack and is efficient and fast enough even on a standard single computer. This innovative approach brings the big data analysis to researchers with limited resources. Our genuine idea was experimentally confirmed and is described here. As a case study for our method we we used the available #Brexit data and the sentiment analysis of corresponding tweets and the correlation with the stock exchange data.

[1]  André Freitas,et al.  A Twitter Sentiment Gold Standard for the Brexit Referendum , 2016, SEMANTICS.

[2]  Fernando Bação,et al.  Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[3]  André Freitas,et al.  In or Out? Real-Time Monitoring of BREXIT sentiment on Twitter , 2016, SEMANTiCS.

[4]  Teuvo Kohonen,et al.  Essentials of the self-organizing map , 2013, Neural Networks.

[5]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[6]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[7]  Vladimir Vlassov,et al.  Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server , 2015, 2015 IEEE Fifth International Conference on Big Data and Cloud Computing.

[8]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.