Big Streaming Data Sampling and Optimization

This research addresses and resolves the issues with the confidence level of sampled big streaming data that is dynamic with respect to the speed of the streaming data and the dynamically changing sample space. Based on a preliminary work and results from [8], this research focuses more on the confidence level and threshold of dynamic size of the population in order to ensure a better confidence level of the sampled data with respect to a few variables such as speed of the streaming data, population size dynamic over time, sample space (or size), speed of sampling algorithm, size of streaming data, and time duration of data streaming. Theoretical thresholds of the processing of big streaming data with respect to a set of variables as mentioned above are identified in an effort for optimization. Simulation results along with experimental results are provided to validate the efficacy of the proposed theoretical thresholds.

[1]  Leonard Barolli,et al.  An Efficient Sampling and Classification Approach for Flow Detection in SDN-Based Big Data Centers , 2017, 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA).

[2]  Krishna P. Gummadi,et al.  Sampling Content from Online Social Networks , 2015, ACM Trans. Web.

[3]  Nagiza F. Samatova,et al.  Reservoir-Based Random Sampling with Replacement from Data Stream , 2004, SDM.

[4]  Theodore Johnson,et al.  Sampling algorithms in a stream operator , 2005, SIGMOD '05.

[5]  Vijay Gadepally,et al.  Sampling operations on big data , 2015, 2015 49th Asilomar Conference on Signals, Systems and Computers.

[6]  Nohpill Park,et al.  Big Streaming Data Buffering Optimization , 2016, 2016 4th Intl Conf on Applied Computing and Information Technology/3rd Intl Conf on Computational Science/Intelligence and Applied Informatics/1st Intl Conf on Big Data, Cloud Computing, Data Science & Engineering (ACIT-CSII-BCD).

[7]  Charles Teddlie,et al.  Mixed Methods Sampling A Typology With Examples , 2016 .

[8]  Xiaohua Jia,et al.  The Impact of Sampling on Big Data Analysis of Social Media: A Case Study on Flu and Ebola , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).