Deep learning for small and big data in psychiatry

Psychiatry today must gain a better understanding of the common and distinct pathophysiological mechanisms underlying psychiatric disorders in order to deliver more effective, person-tailored treatments. To this end, it appears that the analysis of ‘small’ experimental samples using conventional statistical approaches has largely failed to capture the heterogeneity underlying psychiatric phenotypes. Modern algorithms and approaches from machine learning, particularly deep learning, provide new hope to address these issues given their outstanding prediction performance in other disciplines. The strength of deep learning algorithms is that they can implement very complicated, and in principle arbitrary predictor-response mappings efficiently. This power comes at a cost, the need for large training (and test) samples to infer the (sometimes over millions of) model parameters. This appears to be at odds with the as yet rather ‘small’ samples available in psychiatric human research to date ( n  < 10,000), and the ambition of predicting treatment at the single subject level ( n  = 1). Here, we aim at giving a comprehensive overview on how we can yet use such models for prediction in psychiatry. We review how machine learning approaches compare to more traditional statistical hypothesis-driven approaches, how their complexity relates to the need of large sample sizes, and what we can do to optimally use these powerful techniques in psychiatric neuroscience.

[1]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[2]  Michel Cosnard,et al.  Computability with Low-Dimensional Dynamical Systems , 1994, Theor. Comput. Sci..

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Simon B. Eickhoff,et al.  Systematic misestimation of machine learning performance in neuroimaging studies of depression , 2019, Neuropsychopharmacology.

[5]  Vince D. Calhoun,et al.  Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: Evidence from whole-brain resting-state functional connectivity patterns of schizophrenia , 2016, NeuroImage.

[6]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[7]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[8]  Ana Maria Triana Hoyos,et al.  Smartphone-Based Tracking of Sleep in Depression, Anxiety, and Psychotic Disorders , 2019, Current Psychiatry Reports.

[9]  Micah Cearns,et al.  Recommendations and future directions for supervised machine learning in psychiatry , 2019, Translational Psychiatry.

[10]  Hailong Li,et al.  Diagnosing Autism Spectrum Disorder from Brain Resting-State Functional Connectivity Patterns Using a Deep Neural Network with a Novel Feature Selection Method , 2017, Front. Neurosci..

[11]  R. Dobson,et al.  Characterisation of mental health conditions in social media using Informed Deep Learning , 2017, Scientific Reports.

[12]  Jeffrey Dean,et al.  Scalable and accurate deep learning with electronic health records , 2018, npj Digital Medicine.

[13]  E. Basar,et al.  Gamma, alpha, delta, and theta oscillations govern cognitive processes. , 2001, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[14]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[15]  Lei Zheng,et al.  DeepMood: Modeling Mobile Phone Typing Dynamics for Mood Detection , 2017, KDD.

[16]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[17]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[18]  Florin C. Popescu,et al.  Sample Size, Model Robustness, and Classification Accuracy in Diagnostic Multivariate Neuroimaging Analyses , 2018, Biological Psychiatry.

[19]  D. Kupfer,et al.  Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report. , 2006, The American journal of psychiatry.

[20]  Stan Matwin,et al.  Multimodal Deep Learning for Mental Disorders Prediction from Audio Speech Samples , 2019, ArXiv.

[21]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[22]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[23]  R. Kotov,et al.  Cell type-specific gene expression patterns associated with posttraumatic stress disorder in World Trade Center responders , 2019, Translational Psychiatry.

[24]  Richard Simon,et al.  Bias in error estimation when using cross-validation for model selection , 2006, BMC Bioinformatics.

[25]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[26]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Shafiq R. Joty,et al.  Impact of Physical Activity on Sleep: A Deep Learning Based Exploration , 2016, ArXiv.

[28]  Ghassem Tofighi,et al.  DeepAD: Alzheimer’s Disease Classification via Deep Convolutional Neural Networks using MRI and fMRI , 2016, bioRxiv.

[29]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[30]  A. Mechelli,et al.  Using deep learning to investigate the neuroimaging correlates of psychiatric and neurological disorders: Methods and applications , 2017, Neuroscience & Biobehavioral Reviews.

[31]  José Manuel Benítez,et al.  On the use of cross-validation for time series predictor evaluation , 2012, Inf. Sci..

[32]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[33]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[34]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[35]  M. Frank,et al.  Computational psychiatry as a bridge from neuroscience to clinical applications , 2016, Nature Neuroscience.

[36]  Daniel Durstewitz,et al.  Identifying nonlinear dynamical systems via generative recurrent neural networks with applications to fMRI , 2019, PLoS Comput. Biol..

[37]  Navdeep Jaitly,et al.  Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[38]  I. Peate Men's health : the practice nurse's handbook , 2007 .

[39]  Daniel Durstewitz,et al.  Recurrent Neural Networks in Mobile Sampling and Intervention , 2018, Schizophrenia bulletin.

[40]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[41]  A. Mechelli,et al.  Using Support Vector Machine to identify imaging biomarkers of neurological and psychiatric disease: A critical review , 2012, Neuroscience & Biobehavioral Reviews.

[42]  Alicia R. Martin,et al.  Predicting Polygenic Risk of Psychiatric Disorders , 2019, Biological Psychiatry.

[43]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[44]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[45]  Takeo Watanabe,et al.  A small number of abnormal brain connections predicts adult autism spectrum disorder , 2016, Nature Communications.

[46]  Albert Montillo,et al.  Anatomically-Informed Data Augmentation for functional MRI with Applications to Deep Learning , 2019, Medical Imaging: Image Processing.

[47]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[48]  P. Fox,et al.  Identification of a common neurobiological substrate for mental illness. , 2015, JAMA psychiatry.

[49]  Amos Storkey,et al.  Meta-Learning in Neural Networks: A Survey , 2020, IEEE transactions on pattern analysis and machine intelligence.

[50]  T. Insel,et al.  Wesleyan University From the SelectedWorks of Charles A . Sanislow , Ph . D . 2010 Research Domain Criteria ( RDoC ) : Toward a New Classification Framework for Research on Mental Disorders , 2018 .

[51]  Yann LeCun,et al.  Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.

[52]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[53]  Luca Maria Gambardella,et al.  Mitosis Detection in Breast Cancer Histology Images with Deep Neural Networks , 2013, MICCAI.

[54]  Hyo Jong Lee,et al.  Reading the (functional) writing on the (structural) wall: Multimodal fusion of brain structure and function via a deep neural network based translation approach reveals novel impairments in schizophrenia , 2018, NeuroImage.

[55]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[56]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[57]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[58]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[59]  Seong-Whan Lee,et al.  Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis , 2014, NeuroImage.

[60]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[61]  Jeffrey L. Gunter,et al.  Medical Image Synthesis for Data Augmentation and Anonymization using Generative Adversarial Networks , 2018, SASHIMI@MICCAI.

[62]  Shumeet Baluja,et al.  Advances in Neural Information Processing , 1994 .

[63]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[64]  Juntang Zhuang,et al.  Brain Biomarker Interpretation in ASD Using Deep Learning and fMRI , 2018, MICCAI.

[65]  Shantenu Jha,et al.  Learning Neural Markers of Schizophrenia Disorder Using Recurrent Neural Networks , 2017, ArXiv.

[66]  Shafiq R. Joty,et al.  Sleep Quality Prediction From Wearable Data Using Deep Learning , 2016, JMIR mHealth and uHealth.

[67]  Jing Sui,et al.  Discriminating schizophrenia using recurrent neural network applied on time courses of multi-site FMRI data , 2019, EBioMedicine.

[68]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[69]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[70]  Kai Wang,et al.  Multi-Site Diagnostic Classification of Schizophrenia Using Discriminant Deep Learning with Functional Connectivity MRI , 2018, EBioMedicine.

[71]  Giovanni Montana,et al.  Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker , 2016, NeuroImage.

[72]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[73]  William D. Penny,et al.  Comparing Dynamic Causal Models using AIC, BIC and Free Energy , 2012, NeuroImage.

[74]  Qiang Ye,et al.  Orthogonal Recurrent Neural Networks with Scaled Cayley Transform , 2017, ICML.

[75]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[76]  Saeed Hassanpour,et al.  Identifying substance use risk based on deep neural networks and Instagram social media data , 2018, Neuropsychopharmacology.

[77]  P. Falkai,et al.  Machine Learning Approaches for Clinical Psychology and Psychiatry. , 2018, Annual review of clinical psychology.

[78]  J. C. Quinn,et al.  Precision variational approximations in statistical data assimilation , 2014 .

[79]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[80]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[81]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[82]  Zoubin Ghahramani,et al.  Bayesian non-parametrics and the probabilistic approach to modelling , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[83]  G. V. van Wingen,et al.  Deep learning applications for the classification of psychiatric disorders using neuroimaging data: Systematic review and meta-analysis , 2020, NeuroImage: Clinical.

[84]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[85]  Yue Lu,et al.  Corrigendum: LSD1 is essential for oocyte meiotic progression by regulating CDC25B expression in mice , 2016, Nature Communications.

[86]  Daniel Durstewitz,et al.  Deep neural networks in psychiatry , 2019, Molecular Psychiatry.

[87]  Fabien Ringeval,et al.  Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge , 2014, AVEC@ACM Multimedia.

[88]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[89]  Yudong Zhang,et al.  Alcoholism Detection by Data Augmentation and Convolutional Neural Network with Stochastic Pooling , 2017, Journal of Medical Systems.

[90]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[91]  Alex Zhavoronkov,et al.  Applications of Deep Learning in Biomedicine. , 2016, Molecular pharmaceutics.

[92]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[93]  Chung-Hsien Wu,et al.  Mood detection from daily conversational speech using denoising autoencoder and LSTM , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[94]  Chao Shang,et al.  VIGAN: Missing view imputation with generative adversarial networks , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[95]  Dominik Schmidt,et al.  Inferring Dynamical Systems with Long-Range Dependencies through Line Attractor Regularization , 2019, ArXiv.

[96]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[97]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[98]  SchmidhuberJürgen Deep learning in neural networks , 2015 .

[99]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[100]  Naixue Xiong,et al.  Spatio-temporal deep learning method for ADHD fMRI classification , 2019, Inf. Sci..

[101]  Daniel Durstewitz,et al.  Computational models as statistical tools , 2016, Current Opinion in Behavioral Sciences.

[102]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[103]  Simon B Eickhoff,et al.  Identification of Common Neural Circuit Disruptions in Cognitive Control Across Psychiatric Disorders. , 2017, The American journal of psychiatry.

[104]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[105]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[106]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[107]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[108]  A. Meyer-Lindenberg,et al.  Machine Learning for Precision Psychiatry: Opportunities and Challenges. , 2017, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[109]  Daniel Durstewitz,et al.  Psychiatric Illnesses as Disorders of Network Dynamics , 2018, 1809.06303.

[110]  Philip M. Long,et al.  Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.

[111]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[112]  Danilo Bzdok,et al.  Points of Significance: Statistics versus machine learning , 2018, Nature Methods.

[113]  R. Kahn,et al.  Detecting Neuroimaging Biomarkers for Psychiatric Disorders: Sample Size Matters , 2016, Front. Psychiatry.

[114]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[115]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[116]  Henry D. I. Abarbanel,et al.  Machine Learning: Deepest Learning as Statistical Data Assimilation Problems , 2017, Neural Computation.

[117]  K. Kendler,et al.  What kinds of things are psychiatric disorders? , 2010, Psychological Medicine.

[118]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[119]  A. U.S.,et al.  Effective degrees of freedom : a flawed metaphor , 2015 .

[120]  Andrea Mechelli,et al.  Using deep autoencoders to identify abnormal brain structural patterns in neuropsychiatric disorders: A large‐scale multi‐sample study , 2018, Human brain mapping.

[121]  Giovanni Montana,et al.  Predicting Alzheimer's disease: a neuroimaging study with 3D convolutional neural networks , 2015, ICPRAM 2015.

[122]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[123]  Levent Sagun,et al.  Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.

[124]  Daniel Durstewitz Advanced Data Analysis in Neuroscience , 2017 .

[125]  Zhihai Lu,et al.  Pathological brain detection based on AlexNet and transfer learning , 2019, J. Comput. Sci..

[126]  Masahiro Kimura,et al.  Learning dynamical systems by recurrent neural networks from orbits , 1998, Neural Networks.

[127]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[128]  V. Calhoun,et al.  Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness. , 2016, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[129]  Liang Chen,et al.  GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks , 2018, ArXiv.

[130]  Ioannis Panageas,et al.  Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems , 2020, ICML.

[131]  Jamie D. Feusner,et al.  Computational non-linear dynamical psychiatry: a new methodological paradigm for diagnosis and course of illness. , 2012, Journal of psychiatric research.

[132]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[133]  Mark A. Smith,et al.  Challenges and opportunities for drug discovery in psychiatric disorders: the drug hunters' perspective. , 2010, The international journal of neuropsychopharmacology.

[134]  Robert L. Obenchain,et al.  Classical F-Tests and Confidence Regions for Ridge Regression , 1977 .

[135]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[136]  A. Montanari,et al.  The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .

[137]  B. Franke,et al.  From estimating activation locality to predicting disorder: A review of pattern recognition for neuroimaging-based psychiatric diagnostics , 2015, Neuroscience & Biobehavioral Reviews.

[138]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[139]  Paul M. Thompson,et al.  Scanner invariant representations for diffusion MRI harmonization , 2019, Magnetic resonance in medicine.

[140]  Vince D. Calhoun,et al.  Deep learning for neuroimaging: a validation study , 2013, Front. Neurosci..

[141]  Guoqiang Peter Zhang,et al.  An investigation of model selection criteria for neural network time series forecasting , 2001, Eur. J. Oper. Res..

[142]  Luke J. Chang,et al.  Building better biomarkers: brain models in translational neuroimaging , 2017, Nature Neuroscience.

[143]  Saharon Rosset,et al.  Leakage in data mining: formulation, detection, and avoidance , 2011, TKDD.

[144]  Agatha Lenartowicz,et al.  Classification Accuracy of Neuroimaging Biomarkers in Attention-Deficit/Hyperactivity Disorder: Effects of Sample Size and Circular Analysis. , 2019, Biological psychiatry. Cognitive neuroscience and neuroimaging.

[145]  Gabriele M. T. D'Eleuterio,et al.  Synthesis of recurrent neural networks for dynamical system simulation , 2015, Neural Networks.

[146]  Marcel A. J. van Gerven,et al.  Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream , 2014, The Journal of Neuroscience.

[147]  Chunyan Miao,et al.  3D CNN Based Automatic Diagnosis of Attention Deficit Hyperactivity Disorder Using Functional and Structural MRI , 2017, IEEE Access.

[148]  Andres Hoyos Idrobo,et al.  Assessing and tuning brain decoders: Cross-validation, caveats, and guidelines , 2016, NeuroImage.

[149]  Alice T. Sawyer,et al.  The Efficacy of Cognitive Behavioral Therapy: A Review of Meta-analyses , 2012, Cognitive Therapy and Research.

[150]  Fabio Ramos,et al.  Correcting differences in multi-site neuroimaging data using Generative Adversarial Networks , 2018, 1803.09375.

[151]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[152]  Joelle Pineau,et al.  Learning Robust Features using Deep Learning for Automatic Seizure Detection , 2016, MLHC.

[153]  A. Franco,et al.  NeuroImage: Clinical , 2022 .

[154]  Eduardo Alonso,et al.  DeepFMRI: End-to-end deep learning for functional connectivity and classification of ADHD using fMRI , 2020, Journal of Neuroscience Methods.

[155]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[156]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[157]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.