Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project

Technologies for scalable analysis of very large datasets have emerged in the domain of internet computing, but are still rarely used in neuroimaging despite the existence of data and research questions in need of efficient computation tools especially in fMRI. In this work, we present software tools for the application of Apache Spark and Graphics Processing Units (GPUs) to neuroimaging datasets, in particular providing distributed file input for 4D NIfTI fMRI datasets in Scala for use in an Apache Spark environment. Examples for using this Big Data platform in graph analysis of fMRI datasets are shown to illustrate how processing pipelines employing it can be developed. With more tools for the convenient integration of neuroimaging file formats and typical processing steps, big data technologies could find wider endorsement in the community, leading to a range of potentially useful applications especially in view of the current collaborative creation of a wealth of large data repositories including thousands of individual fMRI datasets.

[1]  Christian Windischberger,et al.  Toward discovery science of human brain function , 2010, Proceedings of the National Academy of Sciences.

[2]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[3]  Essa Yacoub,et al.  The WU-Minn Human Connectome Project: An overview , 2013, NeuroImage.

[4]  Bing Chen,et al.  An open science resource for establishing reliability and reproducibility in functional connectomics , 2014, Scientific Data.

[5]  Margaret D. King,et al.  The NKI-Rockland Sample: A Model for Accelerating the Pace of Discovery Science in Psychiatry , 2012, Front. Neurosci..

[6]  Timothy Edward John Behrens,et al.  The CONNECT project: Combining macro- and micro-structure , 2013, NeuroImage.

[7]  Xi-Nian Zuo,et al.  A Connectome Computation System for discovery science of brain , 2015 .

[8]  Bharat B. Biswal,et al.  Making data sharing work: The FCP/INDI experience , 2013, NeuroImage.

[9]  G. Varoquaux,et al.  Connectivity‐based parcellation: Critique and implications , 2015, Human brain mapping.

[10]  Hans Knutsson,et al.  fMRI analysis on the GPU - Possibilities and challenges , 2012, Comput. Methods Programs Biomed..

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Mark A. Elliott,et al.  The Philadelphia Neurodevelopmental Cohort: A publicly available resource for the study of normal and abnormal brain development in youth , 2016, NeuroImage.

[13]  D. V. van Essen,et al.  Challenges and Opportunities in Mining Neuroscience Data , 2011, Science.

[14]  Michael P Milham,et al.  Connectomics and new approaches for analyzing human brain functional connectivity , 2015, GigaScience.

[15]  Anders Eklund,et al.  BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs , 2014, Front. Neuroinform..

[16]  C. Jack,et al.  Alzheimer's Disease Neuroimaging Initiative , 2008 .

[17]  Arthur W. Toga,et al.  Human neuroimaging as a “Big Data” science , 2013, Brain Imaging and Behavior.

[18]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[19]  M. Milham,et al.  The ADHD-200 Consortium: A Model to Advance the Translational Potential of Neuroimaging in Clinical Neuroscience , 2012, Front. Syst. Neurosci..

[20]  Roland N. Boubela,et al.  A highly parallelized framework for computationally intensive MR data analysis , 2012, Magnetic Resonance Materials in Physics, Biology and Medicine.

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  Reza Bosagh Zadeh,et al.  Dimension Independent Matrix Square using MapReduce , 2013, ArXiv.

[23]  X. Zuo,et al.  Test-retest reliabilities of resting-state FMRI measurements in human brain functional connectomics: A systems neuroscience perspective , 2014, Neuroscience & Biobehavioral Reviews.

[24]  Tianzi Jiang,et al.  Brainnetome: A new -ome to understand the brain and its disorders , 2013, NeuroImage.

[25]  Timothy O. Laumann,et al.  Informatics and Data Mining Tools and Strategies for the Human Connectome Project , 2011, Front. Neuroinform..

[26]  Anders Eklund,et al.  Medical image processing on the GPU - Past, present and future , 2013, Medical Image Anal..

[27]  Ashish Goel,et al.  Dimension independent similarity computation , 2012, J. Mach. Learn. Res..

[28]  Takashi Kawashima,et al.  Mapping brain activity at scale with cluster computing , 2014, Nature Methods.

[29]  O. Sporns,et al.  Network centrality in the human functional connectome. , 2012, Cerebral cortex.

[30]  O. Sporns,et al.  Complex brain networks: graph theoretical analysis of structural and functional systems , 2009, Nature Reviews Neuroscience.

[31]  Limsoon Wong,et al.  Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes , 2013, BMC Bioinformatics.