An overview of online based platforms for sharing and analyzing electrophysiology data from big data perspective

With the development of applications and high‐throughput sensor technologies in medical fields, scientists and scientific professionals are facing a big challenge—how to manage and analyze the big electrophysiological datasets created by these sensor technologies. The challenge exhibits several aspects: one is the size of the data (which is usually more than terabytes); the second is the format used to store the data (the data created are generally stored using different formats); the third is that most of these unstructured, semi‐structured, or structured datasets are still distributed over many researchers' own local computers in their laboratories, which are not open access, to become isolated data islands. Thus, how to overcome the challenge and share/mine the scientific data has become an important research topic. The aim of this paper is to systematically review recent published research on the developed web‐based electrophysiological data platforms from the perspective of cloud computing and programming frameworks. Based on this review, we suggest that a conceptual scientific workflow‐based programming framework associated with an elastic cloud computing environment running big data tools (such as Hadoop and Spark) is a good choice for facilitating effective data mining and collaboration among scientists. WIREs Data Mining Knowl Discov 2017, 7:e1206. doi: 10.1002/widm.1206

[1]  Yongcheng Li,et al.  A Novel Robot System Integrating Biological and Mechanical Intelligence Based on Dissociated Neural Network-Controlled Closed-Loop Environment , 2016, PloS one.

[2]  Yao Sun,et al.  HBase, MapReduce, and Integrated Data Visualization for Processing Clinical Signal Data , 2011, AAAI Spring Symposium: Computational Physiology.

[3]  Marta Mattoso,et al.  A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.

[4]  Kenneth D. Harris,et al.  Data Sharing for Computational Neuroscience , 2008, Neuroinformatics.

[5]  Stephen Michael Kosslyn,et al.  Graph Design for the Eye and Mind , 2006 .

[6]  Leslie Smith,et al.  Why sharing matters for electrophysiological data analysis , 2015, Brain Research Bulletin.

[7]  Anders M. Dale,et al.  Generalized Laminar Population Analysis (gLPA) for Interpretation of Multielectrode Data from Cortex , 2016, Front. Neuroinform..

[8]  Christian Darabos,et al.  The multiscale backbone of the human phenotype network based on biological pathways , 2014, BioData Mining.

[9]  Qin,et al.  A Brain–Spinal Interface Alleviating Gait Deficits after Spinal Cord Injury in Primates , 2017 .

[10]  Robert B. Ross,et al.  The Top 10 Challenges in Extreme-Scale Visual Analytics , 2012, IEEE Computer Graphics and Applications.

[11]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[12]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[13]  Kai Li,et al.  How Portable Are the Metadata Standards for Scientific Data? A Proposal for a Metadata Infrastructure , 2013, Dublin Core Conference.

[14]  M.P. Singh,et al.  The E-Commerce Inversion , 1999, IEEE Internet Computing.

[15]  E. Krishnan,et al.  Big Data and Clinicians: A Review on the State of the Science , 2014, JMIR medical informatics.

[16]  Benjamin H. Brinkmann,et al.  Metadata and annotations for multi-scale electrophysiological data , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[17]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[18]  Rajkumar Buyya,et al.  Big Data Analytics = Machine Learning + Cloud Computing , 2016, ArXiv.

[19]  Nilmini Wickramasinghe,et al.  Building Sustainable Health Ecosystems , 2016, Communications in Computer and Information Science.

[20]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[21]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[22]  Jano I. van Hemert,et al.  Scientific Workflow: A Survey and Research Directions , 2007, PPAM.

[23]  Nenad Sarapa,et al.  Quantitative Performance of E‐Scribe Warehouse in Detecting Quality Issues With Digital Annotated ECG Data From Healthy Subjects , 2008, Journal of clinical pharmacology.

[24]  Brian Matthews,et al.  Metadata for Nanoscience Experiments , 2016, DAMDID/RCDL.

[25]  Nick Bostrom,et al.  Future Progress in Artificial Intelligence: A Survey of Expert Opinion , 2013, PT-AI.

[26]  Vincent C. Mller Fundamental Issues of Artificial Intelligence - 2nd Conference on Philosophy and Theory of Artificial Intelligence, PT-AI 2013, Oxford, UK, September 21-22, 2013, selected and invited papers , 2016, PT-AI.

[27]  M Alvarez-Gonzalez,et al.  Web site on heart rate variability: HRV-Site , 2010, 2010 Computing in Cardiology.

[28]  David N Kennedy,et al.  Data sharing and publishing in the field of neuroimaging , 2012, GigaScience.

[29]  J. Couderc The telemetric and holter ECG warehouse initiative (THEW): A data repository for the design, implementation and validation of ECG-related technologies , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[30]  Biqing Huang,et al.  A scientific workflow management system architecture and its scheduling based on cloud service platform for manufacturing big data analytics , 2016 .

[31]  Pierre Yger,et al.  Neo: an object model for handling electrophysiology data in multiple formats , 2014, Front. Neuroinform..

[32]  Jaroslav Pokorny NoSQL databases: a step to database scalability in web environment , 2011, iiWAS '11.

[33]  Chris D Nugent,et al.  A Web-based tool for processing and visualizing body surface potential maps. , 2010, Journal of electrocardiology.

[34]  Zhenlong Li,et al.  Contemporary Computing Technologies for Processing Big Spatiotemporal Data , 2015 .

[35]  Marcia McNutt,et al.  Data sharing , 2016, Science.

[36]  Mingming Zhang,et al.  Big data challenges in decoding cortical activity in a human with quadriplegia to inform a brain computer interface , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[37]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[38]  William B. S. Pressly,et al.  Web-Based Collection of Expert Opinion on Routine Scalp EEG: Software Development and Interrater Reliability , 2011, Journal of clinical neurophysiology : official publication of the American Electroencephalographic Society.

[39]  Jaana M. Hartikainen,et al.  RAD51B in Familial Breast Cancer , 2016, PloS one.

[40]  Alois Schlögl An overview on data formats for biomedical signals , 2009 .

[41]  I. Gonzalez,et al.  eLab: A web-based platform to perform HRV and HRT analysis and store cardiac signals , 2013, Computing in Cardiology 2013.

[42]  Chenguang He,et al.  HCloud: A novel application-oriented cloud platform for preventive healthcare , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[43]  Roman Moucek,et al.  EEG/ERP Portal -- Semantic Web Extension: Generating Ontology from Object Oriented Model , 2010, 2010 Second WRI Global Congress on Intelligent Systems.

[44]  Christof Koch,et al.  Worldwide initiatives to advance brain research , 2016, Nature Neuroscience.

[45]  Catherine P. Jayapandian Cloudwave: A Cloud Computing Framework for Multimodal Electrophysiological Big Data , 2014 .

[46]  Ana L. N. Fred,et al.  A web-based platform for biosignal visualization and annotation , 2013, Multimedia Tools and Applications.

[47]  Yuduo Zhou,et al.  Large Scale Distributed File System Survey , 2012 .

[48]  Richard Branch,et al.  Cloud Computing and Big Data: A Review of Current Service Models and Hardware Perspectives , 2014 .

[49]  KitchenhamBarbara,et al.  A systematic review of systematic review process research in software engineering , 2013 .

[50]  Benjamin H. Brinkmann,et al.  Multiscale electrophysiology format: An open-source electrophysiology format using data compression, encryption, and cyclic redundancy check , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[51]  Nigel Collier,et al.  A framework for enhancing spatial and temporal granularity in report-based health surveillance systems , 2010, BMC Medical Informatics Decis. Mak..

[52]  Gary R. Mirams,et al.  Minimum Information about a Cardiac Electrophysiology Experiment (MICEE): Standardised Reporting for Model Reproducibility, Interoperability, and Data Sharing , 2011, Progress in biophysics and molecular biology.

[53]  Kai Petersen,et al.  Guidelines for conducting systematic mapping studies in software engineering: An update , 2015, Inf. Softw. Technol..

[54]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[55]  Takashi Kawashima,et al.  Mapping brain activity at scale with cluster computing , 2014, Nature Methods.

[56]  Tharam S. Dillon,et al.  Cloud Computing: Issues and Challenges , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[57]  Anju Bala,et al.  Survey Paper on Workflow Scheduling Algorithms Used in Cloud Computing , 2006 .

[58]  Brian Owens,et al.  DATA SHARING. Montreal institute going 'open' to accelerate science. , 2016, Science.

[59]  Jing Hua,et al.  Service-Oriented Architecture for VIEW: A Visual Scientific Workflow Management System , 2008, 2008 IEEE International Conference on Services Computing.

[60]  Eunmi Choi,et al.  A Taxonomy and Survey on Distributed File Systems , 2008, 2008 Fourth International Conference on Networked Computing and Advanced Information Management.

[61]  Eugenio Sper de Almeida,et al.  The importance of metrological metadata in the environmental monitoring , 2016 .

[62]  Carolyn McGregor,et al.  Big Data in Neonatal Intensive Care , 2013, Computer.

[63]  Kay A. Robbins,et al.  Hierarchical Event Descriptors (HED): Semi-Structured Tagging for Real-World Events in Large-Scale EEG , 2016, Front. Neuroinform..

[64]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[65]  Catherine E. Chronaki,et al.  OpenECG: Promoting Interoperability Through the Consistent Implementation of the SCP-ECG Standard in Electrocardiography , 2007, MedInfo.

[66]  Chien-Hung Chen,et al.  Cloudwave: Distributed Processing of "Big Data" from Electrophysiological Recordings for Epilepsy Clinical Research Using Hadoop , 2013, AMIA.

[67]  Anirvan Ghosh,et al.  mTORC1 Inhibition Corrects Neurodevelopmental and Synaptic Alterations in a Human Stem Cell Model of Tuberous Sclerosis. , 2016, Cell reports.

[68]  J Blomer A Survey on Distributed File System Technology , 2015 .

[69]  Ananth Balashankar,et al.  Software Defined Networking , 2019, 2019 19th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA).

[70]  Pearl Brereton,et al.  A systematic review of systematic review process research in software engineering , 2013, Inf. Softw. Technol..

[71]  Erik Pruyt Smart transition management to smarten energy systems in a deeply uncertain world , 2011, 2011 Proceedings of PICMET '11: Technology Management in the Energy Smart World (PICMET).

[72]  Ieee Staff 2013 IEEE 5th International Conference on Cloud Computing Technology and Science (CloudCom) , 2013 .

[73]  Scott Shenker,et al.  Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[74]  Adam R Ferguson,et al.  Big data from small data: data-sharing in the 'long tail' of neuroscience , 2014, Nature Neuroscience.

[75]  M. Park,et al.  Construction of an Open‐Access QT Database for Detecting the Proarrhythmia Potential of Marketed Drugs: ECG‐ViEW , 2012, Clinical pharmacology and therapeutics.

[76]  Benjamin H. Brinkmann,et al.  Large-scale electrophysiology: Acquisition, compression, encryption, and storage of big data , 2009, Journal of Neuroscience Methods.

[77]  Chris D. Nugent,et al.  An XML Format for storing Body Surface Potential Map recordings , 2009, 2009 36th Annual Computers in Cardiology Conference (CinC).

[78]  Il-Yeol Song,et al.  Relational versus non-relational database systems for data warehousing , 2010, DOLAP '10.

[79]  Jeffrey A. Delmerico,et al.  Comparing the performance of clusters, Hadoop, and Active Disks on microarray correlation computations , 2009, 2009 International Conference on High Performance Computing (HiPC).

[80]  Maurice van Keulen,et al.  Hadoop for EEG Storage and Processing: A Feasibility Study , 2014, Brain Informatics and Health.

[81]  GhemawatSanjay,et al.  The Google file system , 2003 .

[82]  Kay A. Robbins,et al.  Preparing Laboratory and Real-World EEG Data for Large-Scale Analysis: A Containerized Approach , 2016, Front. Neuroinform..

[83]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[84]  Emad A. Mohammed,et al.  Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends , 2014, BioData Mining.

[85]  Ann L. Chervenak,et al.  Data Management Challenges of Data-Intensive Scientific Workflows , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[86]  Guido Jenster,et al.  CGtag: complete genomics toolkit and annotation in a cloud-based Galaxy , 2014, GigaScience.

[87]  Katarina Stanoevska-Slabeva,et al.  Grid and Cloud Computing, A Business Perspective on Technology and Applications , 2009, Grid and Cloud Computing.

[88]  Changhee Han,et al.  Web-Based System for Advanced Heart Disease Identification Using Grid Computing Technology , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[89]  John P. Morrison,et al.  Interactive annotations to support collaborative analysis of streaming physiological data , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[90]  Giancarlo Guizzardi,et al.  An ontology-based application in heart electrophysiology: representation, reasoning and visualization on the web , 2009, SAC '09.

[91]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[92]  Thomas Wachtler,et al.  Integrated platform and API for electrophysiological data , 2014, Front. Neuroinform..

[93]  Roman Moucek,et al.  Portal for research in electrophysiology — Data integration with neuroscience information framework , 2012, 2012 5th International Conference on BioMedical Engineering and Informatics.

[94]  Li Liu,et al.  A Survey on Workflow Management and Scheduling in Cloud Computing , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[95]  Meng-Wei Hsu,et al.  A cloud computing based 12-lead ECG telemedicine service , 2012, BMC Medical Informatics and Decision Making.

[96]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[97]  Fabio Badilini,et al.  The ISHNE Holter Standard Output File Format , 1998 .

[98]  Miles A. Whittington,et al.  Minimum Information about a Neuroscience Investigation (MINI): Electrophysiology , 2008 .

[99]  Roger G. Mark,et al.  PhysioNet: Physiologic signals, time series and related open source software for basic, clinical, and applied research , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[100]  Brian Litt,et al.  A multimodal platform for cloud-based collaborative research , 2013, 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER).

[101]  S. Locke Clinical Neurology , 1953, Progress in neurology and psychiatry.

[102]  Chien-Hung Chen,et al.  Domain Ontology As Conceptual Model for Big Data Management: Application in Biomedical Informatics , 2014, ER.

[103]  Robert L. Goldstone,et al.  Self-portraits of the brain: cognitive science, data visualization, and communicating brain structure and function , 2015, Trends in Cognitive Sciences.

[104]  Kathryn A. Davis,et al.  Data integration: Combined imaging and electrophysiology data in the cloud , 2016, NeuroImage.

[105]  Álvaro Alesanco Iglesias,et al.  A Review on Digital ECG Formats and the Relationships Between Them , 2012, IEEE Transactions on Information Technology in Biomedicine.

[106]  Gari D. Clifford,et al.  CrowdLabel: A crowdsourcing platform for electrophysiology , 2014, Computing in Cardiology 2014.

[107]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[108]  Yan Zhuge,et al.  Induced Pluripotent Stem Cell – Derived Cardiomyocytes Elucidate Single-Cell Phenotype of Brugada Syndrome , 2016 .

[109]  Carina Lansing,et al.  Capturing and supporting contexts for scientific data sharing via the biological sciences collaboratory , 2004, CSCW.

[110]  Gustavo Deco,et al.  Network dynamics with BrainX3: a large-scale simulation of the human brain network with real-time interaction , 2015, Front. Neuroinform..

[111]  Haimonti Dutta,et al.  Distributed Storage of Large-Scale Multidimensional Electroencephalogram Data Using Hadoop and HBase , 2011, Grid and Cloud Database Management.

[112]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[113]  Wei Chen,et al.  Epilepsy analytic system with cloud computing , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[114]  Jan Grewe,et al.  A Bottom-up Approach to Data Annotation in Neurophysiology , 2011, Front. Neuroinform..

[115]  Geoffrey C. Fox,et al.  Examining the Challenges of Scientific Workflows , 2007, Computer.

[116]  Rahul Ramachandran,et al.  Syntactic and semantic metadata integration for science data use , 2005, Comput. Geosci..

[117]  Chris D. Nugent,et al.  A review of ECG storage formats , 2011, Int. J. Medical Informatics.

[118]  Johan Montagnat,et al.  Medical Images Simulation, Storage, and Processing on the European DataGrid Testbed , 2004, Journal of Grid Computing.

[119]  Jim Austin,et al.  The CARMEN software as a service infrastructure , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[120]  Bandrowski Anita,et al.  Describing neurophysiology data and metadata with OEN, the Ontology for Experimental Neurophysiology , 2014 .

[121]  Magdalena Balazinska,et al.  Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[122]  Mathias Weske,et al.  Scientific Workflows: Business as Usual? , 2009, BPM.

[123]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[124]  Bob Cramblitt,et al.  InterviewAn interview with Ping Fu , 2009, Commun. ACM.

[125]  J Anthony Movshon,et al.  Putting big data to good use in neuroscience , 2014, Nature Neuroscience.

[126]  G. Nolan,et al.  Computational solutions to large-scale data management and analysis , 2010, Nature Reviews Genetics.

[127]  Roland N. Boubela,et al.  Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project , 2016, Front. Neurosci..

[128]  Liana L. Fong,et al.  Effectiveness Assessment of Solid-State Drive Used in Big Data Services , 2014, 2014 IEEE International Conference on Web Services.

[129]  Abraham Silberschatz,et al.  Distributed file systems: concepts and examples , 1990, CSUR.

[130]  George B. Moody LightWAVE: Waveform and annotation viewing and editing in aWeb browser , 2013, Computing in Cardiology 2013.

[131]  Tim Coulter Costing: Non traditional data stores versus traditional DBMS technologies , 2011, 2011 Proceedings of PICMET '11: Technology Management in the Energy Smart World (PICMET).

[132]  Roman Moucek,et al.  Data and metadata models in electrophysiology domain: Separation of data models into semantic hierarchy and its integration into EEGBase , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[133]  Viola Priesemann,et al.  Local active information storage as a tool to understand distributed neural information processing , 2013, Front. Neuroinform..

[134]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[135]  Roman Moucek,et al.  Database of EEG/ERP Experiments , 2010, HEALTHINF.

[136]  Syed Akhter Hossain,et al.  NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison , 2013, ArXiv.

[137]  Thomas Jackson,et al.  A data repository and analysis framework for spontaneous neural activity recordings in developing retina , 2013, bioRxiv.

[138]  Sanjeev Saksena Tailoring interventional arrhythmia therapy to individual patients: seeking the Art and the Science of Cardiac Electrophysiology in 2016 , 2016, Journal of Interventional Cardiac Electrophysiology.

[139]  Andrew J Sharp,et al.  Back to the past in schizophrenia genomics , 2015, Nature Neuroscience.

[140]  Leigh R. Hochberg,et al.  The Emergence of Single Neurons in Clinical Neurology , 2015, Neuron.

[141]  K. Thangaraj,et al.  Heat shock protein 70 gene polymorphisms’ influence on the electrophysiology of long QT syndrome , 2016, Journal of Interventional Cardiac Electrophysiology.

[142]  Yannis A. Dimitriadis,et al.  Grid Characteristics and Uses: A Grid Definition , 2003, European Across Grids Conference.

[143]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .