Disentangling spatio-temporal patterns of brain changes in large-scale brain imaging databases through Independent Gaussian Process Analysis

Introduction The morphological changes affecting the brain over time are related to several biological processes, governed either by healthy aging or by pathological factors. Since these processes are to date largely unknown, we need statistical approaches to automatically identify these latent morphological evolutions through the analysis of structural brain magnetic resonance images (MRIs). Such approaches require to scale to high-dimensional volumetric observations with application to the analysis of large-scale biomedical databases such as UK Biobank. We present a novel spatio-temporal analysis method, aiming at automatically estimating latent spatio-temporal patterns of brain changes from collections of brain MRIs over time. This approach extends standard methods (such as ICA) to encode priors on spatial and temporal properties of the signal measured in brain images. Methods Our method considers the observed data as a matrix factorization of both temporal and spatial sources. The temporal sources are treated as independent Gaussian processes to promote smoothness in time and model a plausible aging evolution. The spatial sources are modeled as Gaussian random fields to encode the spatial continuity of the brain sub-structures. This particular structure allows to factorize the spatial covariance matrix as a Kronecker product over the three spatial dimensions, which greatly simplifies computations and reduces the dimensions of the matrices we work on. The overall model is efficiently optimized through stochastic variational inference. Results We tested our model on synthetic data. We generated statistically independent temporal sources and spatial sources as smooth heatmaps. Then we trained our model so that it disentangles the observed data in two matrices that best fit the generated mixed observations. Fig. 1 below shows on plot A the original temporal sources (in red) and their approximation by the independent Gaussian processes (in blue). Likewise, plot B shows on top the heatmaps manually generated and under them the maps generated by the algorithm. We can observe that the method is able to capture these raw sources from the noisy observations given as target. Conclusion Eventually this method may provide an ideal exploratory tool for analyzing large-scale medical imaging datasets such as the UK Biobank. Indeed, it allows to efficiently scale to both high-dimensional data and large sample sizes, and also identifies hidden spatio-temporal processes in a completely unsupervised manner.