A Multi-Dimensional Big Data Storing System for Generated COVID-19 Large-Scale Data using Apache Spark

The ongoing outbreak of coronavirus disease (COVID-19) had burst out in Wuhan China, specifically in December 2019. COVID-19 has caused by a new virus that had not been identified in human previously. This was followed by a widespread and rapid spread of this epidemic throughout the world. Daily, the number of the confirmed cases are increasing rapidly, number of the suspect increases, based on the symptoms that accompany this disease, and unfortunately number of the deaths also increase. Therefore, with these increases in number of cases around the world, it becomes hard to manage all these cases information with different situations; if the patient either injured or suspect with which symptoms that appeared on the patient. Therefore, there is a critical need to construct a multi-dimensional system to store and analyze the generated large-scale data. In this paper, a Comprehensive Storing System for COVID-19 data using Apache Spark (CSS-COVID) is proposed, to handle and manage the problem caused by increasing the number of COVID-19 daily. CSS-COVID helps in decreasing the processing time for querying and storing COVID-19 daily data. CSS-COVID consists of three stages, namely, inserting and indexing, storing, and querying stage. In the inserting stage, data is divided into subsets and then index each subset separately. The storing stage uses set of storing-nodes to store data, while querying stage is responsible for handling the querying processes. Using Apache Spark in CSS-COVID leverages the performance of dealing with large-scale data of the coronavirus disease injured whom increase daily. A set of experiments are applied, using real COVID-19 Datasets, to prove the efficiency of CSS-COVID in indexing large-scale data.

[1]  D. Heymann,et al.  COVID-19: what is next for public health? , 2020, The Lancet.

[2]  Y. Liao,et al.  COVID-19: Challenges to GIS with Big Data , 2020, Geography and Sustainability.

[3]  R. Brook,et al.  Response to COVID-19 in Taiwan: Big Data Analytics, New Technology, and Proactive Testing. , 2020, JAMA.

[4]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[5]  P. Horby,et al.  A novel coronavirus outbreak of global health concern , 2020, The Lancet.

[6]  M. Fraser,et al.  Straining the System: Novel Coronavirus (COVID-19) and Preparedness for Concomitant Disasters. , 2020, American journal of public health.

[7]  Xinguang Chen,et al.  First two months of the 2019 Coronavirus Disease (COVID-19) epidemic in China: real-time surveillance and evaluation with a second derivative model , 2020, Global Health Research and Policy.

[8]  Jesse M. Ehrenfeld,et al.  The Role of Augmented Intelligence (AI) in Detecting and Preventing the Spread of Novel Coronavirus , 2020, Journal of Medical Systems.

[9]  Elke A. Rundensteiner,et al.  Bulk-insertions into R-trees using the small-tree-large-tree approach , 1998, GIS '98.

[10]  G. Leung,et al.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study , 2020, The Lancet.

[11]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[12]  Qiang Sun,et al.  Prediction of Number of Cases of 2019 Novel Coronavirus (COVID-19) Using Social Media Search Index , 2020, International journal of environmental research and public health.

[13]  Xinhai Li,et al.  Tracking the spread of novel coronavirus (2019-nCoV) based on big data , 2020, medRxiv.

[14]  Lawrence O Gostin,et al.  US Emergency Legal Responses to Novel Coronavirus: Balancing Public Health and Civil Liberties. , 2020, JAMA.

[15]  Nour Eldeen M. Khalifa,et al.  Detection of Coronavirus (COVID-19) Associated Pneumonia based on Generative Adversarial Networks and a Fine-Tuned Deep Transfer Learning Model using Chest X-ray Dataset , 2020, AISI.

[16]  John S. Brownstein,et al.  Epidemiological data from the COVID-19 outbreak, real-time case information , 2020, Scientific Data.

[17]  Yiming Zhang,et al.  α-Satellite: An AI-driven System and Benchmark Datasets for Hierarchical Community-level Risk Assessment to Help Combat COVID-19 , 2020, ArXiv.