An Intelligent Parallel Distributed Streaming Framework for near Real-time Science Sensors and High-Resolution Medical Images

Our goals are to address challenges such as latency, scalability, throughput and heterogeneous data sources of streaming analytics and deep learning pipelines in science sensors and medical imaging applications. We present a prototype Intelligent Parallel Distributed Streaming Framework (IPDSF) that is capable of distributed streaming processing as well as performing distributed deep training in batch mode. IPDSF is designed to run streaming Artificial Intelligent (AI) analytic tasks using data parallelism including partitions of multiple streams of short time sensing data and high-resolution 3D medical images, and fine grain tasks distribution. We will show the implementation of IPDSF for two real world applications, (i) an Air Quality Index based on near real time streaming of aerosol Lidar backscatter and (ii) data generation of Covid-19 Computing Tomography (CT) scans using deep learning. We evaluate the latency, throughput, scalability, and quantitative evaluation of training and prediction compared against a baseline single instance. As the results, IPDSF scales to process thousands of streaming science sensors in parallel for Air Quality Index application. IPDSF uses novel 3D conditional Generative Adversarial Network (cGAN) training using parallel distributed Graphic Processing Units (GPU) nodes to generate realistic 3D high resolution Computed Tomography scans of Covid-19 patient lungs. We will show that IPDSF can reduce cGAN training time linearly with the number of GPUs.

[1]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[2]  Kenji Tanaka,et al.  Communication-Efficient Distributed Deep Learning with GPU-FPGA Heterogeneous Computing , 2020, 2020 IEEE Symposium on High-Performance Interconnects (HOTI).

[3]  Masato Oguchi,et al.  A study of a video analysis framework using Kafka and spark streaming , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  A Deep Machine Learning Approach for Lidar Based Boundary Layer Height Detection , 2020, IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium.

[5]  Gabriel Antoniu,et al.  JetStream: enabling high performance event streaming across cloud data-centers , 2014, DEBS '14.

[6]  Jesús Carretero,et al.  Kulla, a container-centric construction model for building infrastructure-agnostic distributed and parallel applications , 2020, J. Syst. Softw..

[7]  Satellite Data Fusion of Multiple Observed XCO2 using Compressive Sensing and Deep Learning , 2020, IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium.

[8]  Xi Wang,et al.  Evaluating Two-Stream CNN for Video Classification , 2015, ICMR.

[9]  Y. Yesha,et al.  Toward Generating Synthetic CT Volumes using a 3D-Conditional Generative Adversarial Network , 2020, 2020 International Conference on Computational Science and Computational Intelligence (CSCI).

[10]  Saurabh Gupta,et al.  Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[11]  María S. Pérez-Hernández,et al.  KerA: Scalable Data Ingestion for Stream Processing , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[12]  Albert Bifet,et al.  Deep learning in partially-labeled data streams , 2015, SAC.

[13]  Mikhail A. Efremov,et al.  Java Federated Learning Framework Architecture , 2021, 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus).

[14]  Dhabaleswar K. Panda,et al.  Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects , 2020, IEEE Micro.

[15]  Junkyun Choi,et al.  Optimal Load Allocation for Coded Distributed Computation in Heterogeneous Clusters , 2021, IEEE Transactions on Communications.

[16]  Yuval Elovici,et al.  CT-GAN: Malicious Tampering of 3D Medical Imagery using Deep Learning , 2019, USENIX Security Symposium.

[17]  Hari Subramoni,et al.  Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[18]  Nasseh Tabrizi,et al.  Developing a Real-Time Data Analytics Framework for Twitter Streaming Data , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).

[19]  W. Bajwa,et al.  Scaling-Up Distributed Processing of Data Streams for Machine Learning , 2020, Proceedings of the IEEE.

[20]  Saurabh Gupta,et al.  Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.