Analysis and Performance Evaluation of Deep Learning on Big Data

Deep Learning (DL) and Big Data (BD) have converged to a hybrid computing paradigm that merges the dynamic processing in DL models with the computational power of the distributed processing of the BD frameworks. In this context, this work aims to conduct an analysis and performance evaluation of DL applications in BD. The experiments evaluate how the application training completion time can be related to the model’s precision loss and the impacts of distributed computing in DL models. The experiments were performed in Microsoft Azure using BigDL framework, which allows using both Spark and TensorFlow on top of a Yarn cluster. The outcomes revealed a speedup of up to 8x and accuracy higher than 95%.

[1]  Md. Zakirul Alam Bhuiyan,et al.  A Survey on Deep Learning in Big Data , 2017, 22017 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC).

[2]  Tram Truong Huu,et al.  Flexible bandwidth allocation for big data transfer with deadline constraints , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[3]  Michael I. Jordan,et al.  SparkNet: Training Deep Networks in Spark , 2015, ICLR.

[4]  Giuseppe De Pietro,et al.  Deep Neural Network Hyper-Parameter Setting for Classification of Obstructive Sleep Apnea Episodes , 2018, 2018 IEEE Symposium on Computers and Communications (ISCC).

[5]  Michael I. Jordan,et al.  Real-Time Machine Learning: The Missing Pieces , 2017, HotOS.

[6]  Paulo Drews,et al.  Vision-Based Obstacle Avoidance Using Deep Learning , 2016, 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR).

[7]  Jason Dai Building Large-Scale Image Feature Extraction with BigDL at JD .com Artificial Intelligence Image Feature Extraction , 2018 .

[8]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Gilles Fedak,et al.  Enabling Strategies for Big Data Analytics in Hybrid Infrastructures , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[10]  Yang Wang,et al.  BigDL: A Distributed Deep Learning Framework for Big Data , 2018, SoCC.

[11]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[12]  César A. F. De Rose,et al.  Understanding and Minimizing Disk Contention Effects for Data-Intensive Processing in Virtualized Systems , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[13]  Sungroh Yoon,et al.  DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters , 2016 .

[14]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[15]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Pulkit Kumar,et al.  A Big Data Analysis Framework Using Apache Spark and Deep Learning , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[19]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[20]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[21]  Toan H. Vu,et al.  ACOUSTIC SCENE AND EVENT RECOGNITION USING RECURRENT NEURAL NETWORKS , 2016 .

[22]  Hyunjae Kim,et al.  Performance Study of Distributed Big Data Analysis in YARN Cluster , 2018, 2018 International Conference on Information and Communication Technology Convergence (ICTC).

[23]  Dhabaleswar K. Panda,et al.  DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters , 2018, IEEE Transactions on Multi-Scale Computing Systems.

[24]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Julio C. S. dos Anjos,et al.  Aten: A Dispatcher for Big Data Applications in Heterogeneous Systems , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).