Hadoop-EDF: Large-scale Distributed Processing of Electrophysiological Signal Data in Hadoop MapReduce

Rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing signal analyses. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallelly processable. We evaluate Hadoop-EDF's scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster achieved about 26 times and 47 times faster than the sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.

[1]  B. E. M.Tech. M. Jagannath Sleep Apnea Detection Algorithms and Methods using Electrophysiological Signals: A Review of the Literature , 2018 .

[2]  Chien-Hung Chen,et al.  Cloudwave: Distributed Processing of "Big Data" from Electrophysiological Recordings for Epilepsy Clinical Research Using Hadoop , 2013, AMIA.

[3]  Guo-Qiang Zhang,et al.  SpindleSphere: A Web-based Platform for Large-scale Sleep Spindle Analysis and Visualization , 2017, AMIA.

[4]  Kenneth A. Loparo,et al.  A scalable neuroinformatics data flow for electrophysiological signals using MapReduce , 2015, Front. Neuroinform..

[5]  Xiang Zhang,et al.  HyCLASSS: A Hybrid Classifier for Automatic Sleep Stage Scoring , 2018, IEEE Journal of Biomedical and Health Informatics.

[6]  Dimitrios I. Fotiadis,et al.  A robust unsupervised epileptic seizure detection methodology to accelerate large EEG database evaluation , 2018, Biomed. Signal Process. Control..

[7]  U. Raghavendra,et al.  A deep learning approach for Parkinson’s disease diagnosis from EEG signals , 2018, Neural Computing and Applications.

[8]  A Värri,et al.  A simple format for exchange of digitized polygraphic recordings. , 1992, Electroencephalography and clinical neurophysiology.

[9]  E. LESTER SMITH,et al.  AND OTHERS , 2005 .

[10]  Lucia Billeci,et al.  Patient-specific seizure prediction based on heart rate variability and recurrence quantification analysis , 2018, PloS one.

[11]  Catherine P. Jayapandian,et al.  Scaling Up Scientific Discovery in Sleep Medicine: The National Sleep Research Resource. , 2016, Sleep.

[12]  Guo-Qiang Zhang,et al.  The National Sleep Research Resource: towards a sleep data commons , 2018, BCB.

[13]  C. Stam,et al.  Quantitative EEG reflects non-dopaminergic disease severity in Parkinson’s disease , 2018, Clinical Neurophysiology.

[14]  Kebin Jia,et al.  A multi-context learning approach for EEG epileptic seizure detection , 2018, BMC Systems Biology.

[15]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.