Recent Advances in Variational Autoencoders With Representation Learning for Biomedical Informatics: A Survey

Variational autoencoders (VAEs) are deep latent space generative models that have been immensely successful in multiple exciting applications in biomedical informatics such as molecular design, protein design, medical image classification and segmentation, integrated multi-omics data analyses, and large-scale biological sequence analyses, among others. The fundamental idea in VAEs is to learn the distribution of data in such a way that new meaningful data with more intra-class variations can be generated from the encoded distribution. The ability of VAEs to synthesize new data with more representation variance at state-of-art levels provides hope that the chronic scarcity of labeled data in the biomedical field can be resolved. Furthermore, VAEs have made nonlinear latent variable models tractable for modeling complex distributions. This has allowed for efficient extraction of relevant biomedical information from learned features for biological data sets, referred to as unsupervised feature representation learning. In this article, we review the various recent advancements in the development and application of VAEs for biomedical informatics. We discuss challenges and future opportunities for biomedical research with respect to VAEs.

[1]  Hema A. Murthy,et al.  A Generative Model for Zero Shot Learning Using Conditional Variational Autoencoders , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Deepta Rajan,et al.  Improving Reliability of Clinical Models Using Prediction Calibration , 2020, UNSURE/GRAIL@MICCAI.

[3]  Jin Gu,et al.  VASC: Dimension Reduction and Visualization of Single-cell RNA-seq Data by Deep Variational Autoencoder , 2018, Genom. Proteom. Bioinform..

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Casper Kaae Sønderby,et al.  scVAE: variational auto-encoders for single-cell gene expression data , 2020, Bioinform..

[6]  Brendan J. Frey,et al.  Generating and designing DNA with deep generative models , 2017, ArXiv.

[7]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Olivier Bachem,et al.  Recent Advances in Autoencoder-Based Representation Learning , 2018, ArXiv.

[9]  Hilman F. Pardede,et al.  Denoising Convolutional Variational Autoencoders-Based Feature Learning for Automatic Detection of Plant Diseases , 2019, 2019 3rd International Conference on Informatics and Computational Sciences (ICICoS).

[10]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[11]  Graham W. Taylor,et al.  Dataset Augmentation in Feature Space , 2017, ICLR.

[12]  Nicola De Cao,et al.  Hyperspherical Variational Auto-Encoders , 2018, UAI 2018.

[13]  Douglas B. Kell,et al.  VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder , 2020, bioRxiv.

[14]  Andrea Vedaldi,et al.  Interpretable Explanations of Black Boxes by Meaningful Perturbation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[15]  Ming Huang,et al.  Embedding of Molecular Structure Using Molecular Hypergraph Variational Autoencoder with Metric Learning , 2020, Molecular informatics.

[16]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[17]  Huang Liping Correlation of Kidney Deficiency Syndrome with Methylenetetrahydrofolate Reductase Gene Polymorphism in Postmenopausal Women , 2012 .

[18]  N. Zabaras,et al.  Physics-Constrained Predictive Molecular Latent Space Discovery with Graph Scattering Variational Autoencoder , 2020, 2009.13878.

[19]  Hua Yang,et al.  Progress in the Mechanism and Clinical Application of Cilostazol. , 2019, Current topics in medicinal chemistry.

[20]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[21]  Masked Graph Modeling for Molecule Generation , 2021 .

[22]  Russell A. Poldrack,et al.  OpenfMRI: Open sharing of task fMRI data , 2017, NeuroImage.

[23]  Shuai Chen,et al.  Multi-Task Attention-Based Semi-Supervised Learning for Medical Image Segmentation , 2019, MICCAI.

[24]  Liang Liang,et al.  CQ-VAE: Coordinate Quantized VAE for Uncertainty Estimation with Application to Disk Shape Analysis from Lumbar Spine MRI Images , 2020, 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA).

[25]  M. Jorge Cardoso,et al.  Neuromorphologicaly-preserving Volumetric data encoding using VQ-VAE , 2020, ArXiv.

[26]  Angela R. Laird,et al.  Modelling neural correlates of working memory: A coordinate-based meta-analysis , 2012, NeuroImage.

[27]  Lei Xie,et al.  A Cross-Level Information Transmission Network for Predicting Phenotype from New Genotype: Application to Cancer Precision Medicine , 2020, ArXiv.

[28]  Su-In Lee,et al.  DeepProfile: Deep learning of cancer molecular profiles for precision medicine , 2018, bioRxiv.

[29]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[30]  Jonathan Krause,et al.  Tool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  David T. Jones,et al.  Design of metalloproteins and novel protein folds using variational autoencoders , 2018, Scientific Reports.

[32]  Liqing Zhang,et al.  DeepMicro: deep representation learning for disease prediction based on microbiome data , 2019, Scientific Reports.

[33]  Fabian J Theis,et al.  Current best practices in single‐cell RNA‐seq analysis: a tutorial , 2019, Molecular systems biology.

[34]  Dwarikanath Mahapatra,et al.  Semi-supervised Segmentation of Optic Cup in Retinal Fundus Images Using Variational Autoencoder , 2017, MICCAI.

[35]  Ziv Bar-Joseph,et al.  Dhaka: Variational Autoencoder for Unmasking Tumor Heterogeneity from Single Cell Genomic Data , 2017, bioRxiv.

[36]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[37]  Volker Roth,et al.  3DMolNet: A Generative Network for Molecular Structures , 2020, ArXiv.

[38]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[39]  Daniel S. Himmelstein,et al.  Compressing gene expression data using multiple latent space dimensionalities learns complementary biological representations , 2020, Genome Biology.

[40]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[41]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[42]  Pierre Baldi,et al.  Continuous Representation of Molecules using Graph Variational Autoencoder , 2020, AAAI Spring Symposium: MLPS.

[43]  Michael I. Jordan,et al.  Decision-Making with Auto-Encoding Variational Bayes , 2020, NeurIPS.

[44]  David Ryan Koes,et al.  Learning a Continuous Representation of 3D Molecular Structures with Deep Generative Models , 2020, ArXiv.

[45]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[46]  Heinz Handels,et al.  Interpretable explanations of black box classifiers applied on medical images by meaningful perturbations using variational autoencoders , 2019, Medical Imaging: Image Processing.

[47]  Asim Kadav,et al.  S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[49]  Heinz Handels,et al.  Unsupervised Pathology Detection in Medical Images using Learning-based Methods , 2018, Bildverarbeitung für die Medizin.

[50]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[51]  Jackie Matthews,et al.  Image Reconstruction in a Manifold of Image Patches: Application to Whole-Fetus Ultrasound Imaging , 2019, MLMIR@MICCAI.

[52]  C. Varin,et al.  A note on composite likelihood inference and model selection , 2005 .

[53]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[54]  Yadong Wang,et al.  Exploring DNA Methylation Data of Lung Cancer Samples with Variational Autoencoders , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[55]  Ioanna Chouvarda,et al.  Α Respiratory Sound Database for the Development of Automated Classification , 2017, BHI 2017.

[56]  Daniel Rueckert,et al.  Learning Interpretable Anatomical Features Through Deep Generative Models: Application to Cardiac Remodeling , 2018, MICCAI.

[57]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[58]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[59]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[60]  Brock C. Christensen,et al.  A New Dimension of Breast Cancer Epigenetics - Applications of Variational Autoencoders with DNA Methylation , 2018, BIOINFORMATICS.

[61]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[62]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[63]  Ling Shao,et al.  Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning , 2020, IEEE Transactions on Image Processing.

[64]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[65]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[66]  Max Welling,et al.  VAE with a VampPrior , 2017, AISTATS.

[67]  K. Ramamurthy,et al.  Characterizing the Latent Space of Molecular Deep Generative Models with Persistent Homology Metrics , 2020, ArXiv.

[68]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[69]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[70]  Kai Fan,et al.  Zero-Shot Learning via Class-Conditioned Deep Generative Models , 2017, AAAI.

[71]  Brian B. Avants,et al.  The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) , 2015, IEEE Transactions on Medical Imaging.

[72]  Morteza Mardani,et al.  Uncertainty Quantification in Deep MRI Reconstruction , 2021, IEEE Transactions on Medical Imaging.

[73]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[74]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[75]  Le Lu,et al.  DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning , 2018, Journal of medical imaging.

[76]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[77]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[78]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[79]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[80]  Olivier Gevaert,et al.  Genomic data imputation with variational auto-encoders , 2020, GigaScience.

[81]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[82]  B. Rost,et al.  Better prediction of functional effects for sequence variants , 2015, BMC Genomics.

[83]  S Mohammadi,et al.  Penalized Variational Autoencoder for Molecular Design , 2019 .

[84]  Alexander L. Wolf,et al.  A conceptual basis for feature engineering , 1999, J. Syst. Softw..

[85]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[86]  May D. Wang,et al.  –Omic and Electronic Health Record Big Data Analytics for Precision Medicine , 2017, IEEE Transactions on Biomedical Engineering.

[87]  Benson Mwangi,et al.  A Review of Feature Reduction Techniques in Neuroimaging , 2013, Neuroinformatics.

[88]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[89]  Sheng Wang,et al.  Re-balancing Variational Autoencoder Loss for Molecule Sequence Generation , 2020, BCB.

[90]  Pietro Liò,et al.  Unsupervised generative and graph representation learning for modelling cell differentiation , 2019, Scientific Reports.

[91]  Purang Abolmaesumi,et al.  Adaptive Augmentation of Medical Data Using Independently Conditional Variational Auto-Encoders , 2019, IEEE Transactions on Medical Imaging.

[92]  Casey S. Greene,et al.  Extracting a Biologically Relevant Latent Space from Cancer Transcriptomes with Variational Autoencoders , 2017, bioRxiv.

[93]  Callum Court,et al.  3-D Inorganic Crystal Structure Generation and Property Prediction via Representation Learning , 2020, J. Chem. Inf. Model..

[94]  Daniel Rueckert,et al.  Assessing the Impact of Blood Pressure on Cardiac Function Using Interpretable Biomarkers and Variational Autoencoders , 2019, STACOM@MICCAI.

[95]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[96]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[97]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[98]  Bernt Schiele,et al.  F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[99]  Sebasti'an Amador S'anchez,et al.  Explainable-by-design Semi-Supervised Representation Learning for COVID-19 Diagnosis from CT Imaging , 2020, ArXiv.

[100]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[101]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[102]  Carly A. Bobak,et al.  Unsupervised deep learning with variational autoencoders applied to breast tumor genome-wide DNA methylation data with biologic feature extraction , 2018, bioRxiv.

[103]  Martin A. Nowak,et al.  Variational auto-encoding of protein sequences , 2017, ArXiv.

[104]  Andriy Myronenko,et al.  3D MRI brain tumor segmentation using autoencoder regularization , 2018, BrainLes@MICCAI.

[105]  Ausif Mahmood,et al.  Variations in Variational Autoencoders - A Comparative Evaluation , 2020, IEEE Access.

[106]  Kilian M. Pohl,et al.  Variational AutoEncoder For Regression: Application to Brain Aging Analysis , 2019, MICCAI.

[107]  Josien P.W. Pluim,et al.  Orientation-Disentangled Unsupervised Representation Learning for Computational Pathology , 2020, ArXiv.

[108]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[109]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[110]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[111]  Trevor Darrell,et al.  Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[112]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113]  Macheng Shen,et al.  A Probe Towards Understanding GAN and VAE Models , 2018, ArXiv.

[114]  William S. DeWitt,et al.  Deep generative models for T cell receptor protein sequences , 2019, eLife.

[115]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[116]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[117]  Zohreh Shams,et al.  Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice , 2019, bioRxiv.

[118]  Luc Van Gool,et al.  Modelling the Distribution of 3D Brain MRI using a 2D Slice VAE , 2020, MICCAI.

[119]  Florian Jug,et al.  DivNoising: Diversity Denoising with Fully Convolutional Variational Autoencoders , 2020, ArXiv.

[120]  Jiarui Ding,et al.  Deep generative model embedding of single-cell RNA-Seq profiles on hyperspheres and hyperbolic spaces , 2019, Nature Communications.

[121]  Jian Jia,et al.  Methylenetetrahydrofolate reductase C677T gene polymorphism and essential hypertension: A meta-analysis of 10,415 subjects. , 2014, Biomedical reports.

[122]  Stefano Ermon,et al.  InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.

[123]  Martin Styner,et al.  Semi-supervised VAE-GAN for Out-of-Sample Detection Applied to MRI Quality Control , 2019, MICCAI.

[124]  Sattar Hashemi,et al.  To increase quality of feature reduction approaches based on processing input datasets , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[125]  Tian Han,et al.  Learning Latent Space Energy-Based Prior Model for Molecule Generation , 2020, ArXiv.

[126]  Sunkyu Kim,et al.  Improved survival analysis by learning shared genomic information from pan-cancer data , 2020, Bioinform..

[127]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[128]  Debora S Marks,et al.  Deep generative models of genetic variation capture the effects of mutations , 2018, Nature Methods.

[129]  Seokho Kang,et al.  Compressed graph representation for scalable molecular graph generation , 2020, Journal of Cheminformatics.

[130]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[131]  Alex Hawkins-Hooker,et al.  Generating functional protein variants with variational autoencoders , 2020, bioRxiv.

[132]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[133]  Daniel Rueckert,et al.  3D High-Resolution Cardiac Segmentation Reconstruction From 2D Views Using Conditional Variational Autoencoders , 2019, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

[134]  Haibo He,et al.  Variational autoencoder based synthetic data generation for imbalanced learning , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[135]  Shervan Fekri-Ershad,et al.  Pap smear classification using combination of global significant value, texture statistical features and time series features , 2019, Multimedia Tools and Applications.

[136]  Konstantinos Kamnitsas,et al.  Data Efficient Unsupervised Domain Adaptation for Cross-Modality Image Segmentation , 2019, MICCAI.

[137]  Sotirios A. Tsaftaris,et al.  Disentangled representation learning in cardiac image analysis , 2019, Medical Image Anal..