论文信息 - Efficient big data analysis on a single machine using apache spark and self-organizing map libraries

Efficient big data analysis on a single machine using apache spark and self-organizing map libraries

Apache Spark is commonly used as a big data analytical platform on powerful computer clusters, as it primarily employ the main computer memory for the evaluation. Our attempt adds self-organizing map software libraries onto a single big data analytical stack and is efficient and fast enough even on a standard single computer. This innovative approach brings the big data analysis to researchers with limited resources. Our genuine idea was experimentally confirmed and is described here. As a case study for our method we we used the available #Brexit data and the sentiment analysis of corresponding tweets and the correlation with the stock exchange data.

Ioannis Anagnostopoulos | Petr Saloun | David Andresic

[1] André Freitas,et al. A Twitter Sentiment Gold Standard for the Brexit Referendum , 2016, SEMANTICS.

[2] Fernando Bação,et al. Self-organizing Maps as Substitutes for K-Means Clustering , 2005, International Conference on Computational Science.

[3] André Freitas,et al. In or Out? Real-Time Monitoring of BREXIT sentiment on Twitter , 2016, SEMANTiCS.

[4] Teuvo Kohonen,et al. Essentials of the self-organizing map , 2013, Neural Networks.

[5] Zhexue Huang,et al. CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[6] Le Gruenwald,et al. Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[7] Vladimir Vlassov,et al. Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server , 2015, 2015 IEEE Fifth International Conference on Big Data and Cloud Computing.

[8] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.