Intelligent and effective informatic deconvolution of “Big Data” and its future impact on the quantitative nature of neurodegenerative disease therapy

Biomedical data sets are becoming increasingly larger and a plethora of high‐dimensionality data sets (“Big Data”) are now freely accessible for neurodegenerative diseases, such as Alzheimer's disease. It is thus important that new informatic analysis platforms are developed that allow the organization and interrogation of Big Data resources into a rational and actionable mechanism for advanced therapeutic development. This will entail the generation of systems and tools that allow the cross‐platform correlation between data sets of distinct types, for example, transcriptomic, proteomic, and metabolomic. Here, we provide a comprehensive overview of the latest strategies, including latent semantic analytics, topological data investigation, and deep learning techniques that will drive the future development of diagnostic and therapeutic applications for Alzheimer's disease. We contend that diverse informatic “Big Data” platforms should be synergistically designed with more advanced chemical/drug and cellular/tissue‐based phenotypic analytical predictive models to assist in either de novo drug design or effective drug repurposing.

[1]  L. Luttrell,et al.  Beyond Desensitization: Physiological Relevance of Arrestin-Dependent Signaling , 2010, Pharmacological Reviews.

[2]  Jonas Bergquist,et al.  MALDI imaging of post‐mortem human spinal cord in amyotrophic lateral sclerosis , 2013, Journal of neurochemistry.

[3]  Stuart Maudsley,et al.  iTRAQ Analysis of Complex Proteome Alterations in 3xTgAD Alzheimer's Mice: Understanding the Interface between Physiology and Disease , 2008, PloS one.

[4]  H. Prokosch,et al.  Perspectives for Medical Informatics , 2009, Methods of Information in Medicine.

[5]  Noémie Elhadad,et al.  Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts , 2013, J. Biomed. Informatics.

[6]  Catherine N. Norton,et al.  LigerCat: Using "MeSH Clouds" from Journal, Article, or Gene Citations to Facilitate the Identification of Relevant Biomedical Literature , 2009, AMIA.

[7]  Olaf Sporns,et al.  Complex network measures of brain connectivity: Uses and interpretations , 2010, NeuroImage.

[8]  Stuart Maudsley,et al.  Bioinformatic approaches to metabolic pathways analysis. , 2011, Methods in molecular biology.

[9]  L. Luttrell,et al.  Informatic deconvolution of biased GPCR signaling mechanisms from in vivo pharmacological experimentation. , 2016, Methods.

[10]  Hugo Geerts,et al.  Mechanistic disease modeling as a useful tool for improving CNS drug research and development , 2011 .

[11]  Athanasios Gotsopoulos,et al.  Proteomic Profiling in the Brain of CLN1 Disease Model Reveals Affected Functional Modules , 2016, NeuroMolecular Medicine.

[12]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[13]  Nathan A. Yates,et al.  High Resolution Discovery Proteomics Reveals Candidate Disease Progression Markers of Alzheimer’s Disease in Human Cerebrospinal Fluid , 2015, PloS one.

[14]  Bobbie-Jo M Webb-Robertson Support vector machines for improved peptide identification from tandem mass spectrometry database search. , 2009, Methods in molecular biology.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Fleur Mougin,et al.  Reuse of termino-ontological resources and text corpora for building a multilingual domain ontology: An application to Alzheimer's disease , 2014, J. Biomed. Informatics.

[17]  Stuart Maudsley,et al.  GIT2 Acts as a Systems-Level Coordinator of Neurometabolic Activity and Pathophysiological Aging , 2016, Front. Endocrinol..

[18]  C. Jack,et al.  Hypothetical model of dynamic biomarkers of the Alzheimer's pathological cascade , 2010, The Lancet Neurology.

[19]  Y Uto,et al.  Development of Hospital Data Warehouse for Cost Analysis of DPC Based on Medical Costs , 2007, Methods of Information in Medicine.

[20]  L. Luttrell,et al.  Arrestin pathways as drug targets. , 2013, Progress in molecular biology and translational science.

[21]  Vikas Singh,et al.  Imaging-based enrichment criteria using deep learning algorithms for efficient clinical trials in mild cognitive impairment , 2015, Alzheimer's & Dementia.

[22]  R. Lefkowitz,et al.  β-Arrestin 2: A Receptor-Regulated MAPK Scaffold for the Activation of JNK3 , 2000 .

[23]  Christophe Lemetre,et al.  An introduction to artificial neural networks in bioinformatics - application to complex microarray and mass spectrometry datasets in cancer studies , 2008, Briefings Bioinform..

[24]  V. Lobanov,et al.  An Improved Model for Disease Progression in Patients From the Alzheimer's Disease Neuroimaging Initiative , 2012, Journal of clinical pharmacology.

[25]  Jörg Hanrieder,et al.  MALDI Imaging Mass Spectrometry of Neuropeptides in Parkinson's Disease , 2012, Journal of visualized experiments : JoVE.

[26]  Valeria Rimoldi,et al.  Meta-Analysis of Multiple Sclerosis Microarray Data Reveals Dysregulation in RNA Splicing Regulatory Genes , 2015, International journal of molecular sciences.

[27]  João Ricardo Sato,et al.  Measuring Abnormal Brains: Building Normative Rules in Neuroimaging Using One-Class Support Vector Machines , 2012, Front. Neurosci..

[28]  Neta Zach,et al.  Big data to smart data in Alzheimer's disease: Real-world examples of advanced modeling and simulation , 2016, Alzheimer's & Dementia.

[29]  Csaba Böde,et al.  Perturbation waves in proteins and protein networks: applications of percolation and game theories in signaling and drug design. , 2008, Current protein & peptide science.

[30]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[31]  Stuart Maudsley,et al.  Nuclear GIT2 Is an ATM Substrate and Promotes DNA Repair , 2015, Molecular and Cellular Biology.

[32]  James H Harrison,et al.  The development of health care data warehouses to support data mining. , 2008, Clinics in laboratory medicine.

[33]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[34]  W. M. van der Flier,et al.  Functional neural network analysis in frontotemporal dementia and Alzheimer's disease using EEG and graph theory , 2009, BMC Neuroscience.

[35]  Stuart Maudsley,et al.  Minimal Peroxide Exposure of Neuronal Cells Induces Multifaceted Adaptive Responses , 2010, PloS one.

[36]  Andrew D. Rouillard,et al.  GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions , 2015, Bioinform..

[37]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[38]  L. Etheredge,et al.  Rapid learning: a breakthrough agenda. , 2014, Health affairs.

[39]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[40]  A. Fagan,et al.  Functional connectivity and graph theory in preclinical Alzheimer's disease , 2014, Neurobiology of Aging.

[41]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[42]  Raul Rabadan,et al.  Data-driven discovery of seasonally linked diseases from an Electronic Health Records system , 2014, BMC Bioinformatics.

[43]  Ke Zhou,et al.  Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[44]  Markus Stoeckli,et al.  MALDI mass spectrometric imaging of biological tissue sections , 2005, Mechanisms of Ageing and Development.

[45]  Stuart Maudsley,et al.  Live longer sans the AT1A receptor. , 2009, Cell metabolism.

[46]  Stuart Maudsley,et al.  GIT2 Acts as a Potential Keystone Protein in Functional Hypothalamic Networks Associated with Age-Related Phenotypic Changes in Rats , 2012, PloS one.

[47]  Stuart Maudsley,et al.  Delineation of a Conserved Arrestin-Biased Signaling Repertoire In Vivo , 2015, Molecular Pharmacology.

[48]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[49]  William Stafford Noble,et al.  A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores. , 2003, Journal of proteome research.

[50]  Robert J. Lefkowitz,et al.  Activation and targeting of extracellular signal-regulated kinases by β-arrestin scaffolds , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Yan Zhao Intensity-based protein identification by machine learning from a library of tandem mass spectra , 2010 .

[52]  Angus Roberts,et al.  Extracting antipsychotic polypharmacy data from electronic health records: developing and evaluating a novel process , 2015, BMC Psychiatry.

[53]  Avner Schlessinger,et al.  GEN3VA: aggregation and analysis of gene expression signatures from related studies , 2016, BMC Bioinformatics.

[54]  Patrick Santens,et al.  Investigating the role of filamin C in Belgian patients with frontotemporal dementia linked to GRN deficiency in FTLD-TDP brains , 2015, Acta neuropathologica communications.

[55]  B. Frey,et al.  The human splicing code reveals new insights into the genetic determinants of disease , 2015, Science.

[56]  Satoshi Niijima,et al.  GEM-TREND: a web tool for gene expression data mining toward relevant network discovery , 2009, BMC Genomics.

[57]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[58]  Michael W. Berry,et al.  Functional Cohesion of Gene Sets Determined by Latent Semantic Indexing of PubMed Abstracts , 2011, PloS one.

[59]  H Nielsen,et al.  Machine learning approaches for the prediction of signal peptides and other protein sorting signals. , 1999, Protein engineering.

[60]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[61]  S. Rombouts,et al.  Loss of ‘Small-World’ Networks in Alzheimer's Disease: Graph Analysis of fMRI Resting-State Functional Connectivity , 2010, PloS one.

[62]  G Tusch,et al.  Data warehouse and data mining in a surgical clinic. , 2000, Studies in health technology and informatics.

[63]  Pablo Moscato,et al.  Identification of a 5-Protein Biomarker Molecular Signature for Predicting Alzheimer's Disease , 2008, PloS one.

[64]  Maneesh Sahani,et al.  Models of Neuronal Stimulus-Response Functions: Elaboration, Estimation, and Evaluation , 2017, Front. Syst. Neurosci..

[65]  Kengo Kinoshita,et al.  VaProS: a database-integration approach for protein/genome information retrieval , 2016, Journal of Structural and Functional Genomics.

[66]  Thomas Villmann,et al.  Classification of mass-spectrometric data in clinical proteomics using learning vector quantization methods , 2007, Briefings Bioinform..

[67]  D. Stekel,et al.  A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data , 2015, BMC Genomics.

[68]  Sana Siddiqui,et al.  Classification of Alzheimer Diagnosis from ADNI Plasma Biomarker Data , 2013, BCB.

[69]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[70]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[71]  Stephen F. Carter,et al.  Prediction of dementia in MCI patients based on core diagnostic markers for Alzheimer disease , 2013, Neurology.

[72]  Erik M. van Mulligen,et al.  Comparing and combining chunkers of biomedical text , 2011, J. Biomed. Informatics.

[73]  Ramin Homayouni,et al.  Expression Levels of Obesity-Related Genes Are Associated with Weight Change in Kidney Transplant Recipients , 2013, PloS one.

[74]  Stuart Maudsley,et al.  MINIREVIEW—EXPLORING THE BIOLOGY OF GPCRS: FROM IN VITRO TO IN VIVO Fulfilling the Promise of "Biased" G Protein–Coupled Receptor Agonism , 2015 .

[75]  M. Cercignani,et al.  Network functional connectivity and whole-brain functional connectomics to investigate cognitive decline in neurodegenerative conditions. , 2016, Functional neurology.

[76]  Jianfeng Feng,et al.  A machine learning approach to explore the spectra intensity pattern of peptides using tandem mass spectrometry data , 2008, BMC Bioinformatics.

[77]  Taigang He,et al.  Integrated genomic approaches identify major pathways and upstream regulators in late onset Alzheimer’s disease , 2015, Scientific Reports.

[78]  L. Luttrell,et al.  Refining efficacy: exploiting functional selectivity for drug discovery. , 2011, Advances in pharmacology.

[79]  R. Mullins,et al.  β-Arrestin–Dependent Endocytosis of Proteinase-Activated Receptor 2 Is Required for Intracellular Targeting of Activated Erk1/2 , 2000, The Journal of cell biology.

[80]  Bin Chen,et al.  Predicting drug target interactions using meta-path-based semantic network analysis , 2016, BMC Bioinformatics.

[81]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[82]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[83]  Colin Campbell,et al.  The latent process decomposition of cDNA microarray data sets , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[84]  M. Caron,et al.  Beta-arrestin-dependent formation of beta2 adrenergic receptor-Src protein kinase complexes. , 1999, Science.

[85]  Takashi Yoneya PSE: A tool for browsing a large amount of MEDLINE/PubMed abstracts with gene names and common words as the keywords , 2005, BMC Bioinformatics.

[86]  C. Strader,et al.  Muscarinic agonists and antagonists in the treatment of Alzheimer's disease. , 2001, Farmaco.

[87]  Juri Rappsilber,et al.  Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data , 2017, Molecular biology of the cell.

[88]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[89]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[90]  Daniel H. Geschwind,et al.  Alzheimer's disease: From big data to mechanism , 2013, Nature.

[91]  Kok Long Ang,et al.  Targeting of cyclic AMP degradation to beta 2-adrenergic receptors by beta-arrestins. , 2002, Science.

[92]  Norbert Schuff,et al.  Early role of vascular dysregulation on late-onset Alzheimer's disease based on multifactorial data-driven analysis , 2016, Nature Communications.

[93]  Timo Minssen,et al.  Intellectual property rights, standards and data exchange in systems biology , 2016, Biotechnology journal.

[94]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[95]  Kathryn Lunetta,et al.  Identification of Gene-Gene Interactions in Alzheimer Disease Using Co-Operative Game Theory , 2011, Alzheimer's & Dementia.

[96]  Abdel G. Elkahloun,et al.  An integrative genome-wide transcriptome reveals that candesartan is neuroprotective and a candidate therapeutic for Alzheimer’s disease , 2016, Alzheimer's Research & Therapy.

[97]  Wenyaw Chan,et al.  Predicting progression of Alzheimer's disease , 2010, Alzheimer's Research & Therapy.

[98]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[99]  Stuart Maudsley,et al.  Systems-Level G Protein-Coupled Receptor Therapy Across a Neurodegenerative Continuum by the GLP-1 Receptor System , 2014, Front. Endocrinol..

[100]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[101]  Leonid Peshkin,et al.  Resveratrol prevents high fat/sucrose diet-induced central arterial wall inflammation and stiffening in nonhuman primates. , 2014, Cell metabolism.

[102]  Hugo Geerts,et al.  A Humanized Clinically Calibrated Quantitative Systems Pharmacology Model for Hypokinetic Motor Symptoms in Parkinson’s Disease , 2016, Front. Pharmacol..

[103]  Dinggang Shen,et al.  Deep ensemble learning of sparse regression models for brain disease diagnosis , 2017, Medical Image Anal..

[104]  Srinivas C. Turaga,et al.  Connectomic reconstruction of the inner plexiform layer in the mouse retina , 2013, Nature.

[105]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Garrett M. Dancik,et al.  shinyGEO: a web-based application for analyzing gene expression omnibus datasets , 2016, Bioinform..

[107]  P. Embí,et al.  Toward Reuse of Clinical Data for Research and Quality Improvement: The End of the Beginning? , 2009, Annals of Internal Medicine.

[108]  Stuart Maudsley,et al.  Repetitive Peroxide Exposure Reveals Pleiotropic Mitogen-Activated Protein Kinase Signaling Mechanisms , 2010, Journal of signal transduction.

[109]  R. Tibshirani,et al.  Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins , 2007, Nature Medicine.

[110]  Hongyu Chen,et al.  Effective use of latent semantic indexing and computational linguistics in biological and biomedical applications , 2012, Front. Physio..

[111]  C. Stam,et al.  Alzheimer's disease: connecting findings from graph theoretical studies of brain networks , 2013, Neurobiology of Aging.

[112]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[113]  Ruth Williams,et al.  Biomarkers: Warning signs , 2011, Nature.

[114]  Alcione de Paiva Oliveira,et al.  MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm , 2016, BMC Bioinformatics.

[115]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[116]  Murray Grossman,et al.  Plasma multianalyte profiling in mild cognitive impairment and Alzheimer disease , 2012, Neurology.

[117]  Hans-Michael Müller,et al.  Textpresso for Neuroscience: Searching the Full Text of Thousands of Neuroscience Research Papers , 2008, Neuroinformatics.

[118]  Gabriele M. T. D'Eleuterio,et al.  Synthesis of recurrent neural networks for dynamical system simulation , 2015, Neural Networks.

[119]  Martin Hofmann-Apitius,et al.  ‘HypothesisFinder:’ A Strategy for the Detection of Speculative Statements in Scientific Text , 2013, PLoS Comput. Biol..

[120]  Yong He,et al.  Mapping the Alzheimer’s Brain with Connectomics , 2012, Front. Psychiatry.

[121]  Itamar Simon,et al.  MILANO – custom annotation of microarray results using automatic literature searches , 2005, BMC Bioinformatics.

[122]  Guilherme Del Fiol,et al.  Automatically Extracting Sentences from Medline Citations to Support Clinicians' Information Needs , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[123]  Jeffrey M Weinberg,et al.  Warning signs. , 2003, Cutis.

[124]  Carlo Vittorio Cannistraci,et al.  Nonlinear Dimensionality Reduction by Minimum Curvilinearity for Unsupervised Discovery of Patterns in Multidimensional Proteomic Data. , 2016, Methods in molecular biology.

[125]  L. Luttrell,et al.  Functional signaling biases in G protein-coupled receptors: Game Theory and receptor dynamics. , 2012, Mini reviews in medicinal chemistry.

[126]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[127]  Brendan J. Frey,et al.  Deep learning of the tissue-regulated splicing code , 2014, Bioinform..

[128]  Linda Douw,et al.  The Connectome Visualization Utility: Software for Visualization of Human Brain Networks , 2014, PloS one.

[129]  Hongyu Chen,et al.  Textrous!: Extracting Semantic Textual Meaning from Gene Sets , 2013, PloS one.

[130]  Guangtao Ge,et al.  Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles , 2008, BMC Bioinformatics.

[131]  Hongyu Chen,et al.  Plurigon: three dimensional visualization and classification of high-dimensionality data , 2013, Front. Physiol..

[132]  Fuhai Song,et al.  Alzheimer's Disease: Genomics and Beyond. , 2015, International review of neurobiology.

[133]  Guilherme Del Fiol,et al.  Automatically Extracting Sentences from Medline Citations to Support Clinicians' Information Needs , 2012, HISB.

[134]  HONG YUE,et al.  Co-expression network-based analysis of hippocampal expression data associated with Alzheimer's disease using a novel algorithm , 2016, Experimental and therapeutic medicine.

[135]  Regina Berretta,et al.  Multivariate Protein Signatures of Pre-Clinical Alzheimer's Disease in the Alzheimer's Disease Neuroimaging Initiative (ADNI) Plasma Proteome Dataset , 2012, PloS one.

[136]  Kathleen M Jagodnik,et al.  Extraction and analysis of signatures from the Gene Expression Omnibus by the crowd , 2016, Nature Communications.

[137]  Avi Ma'ayan,et al.  Genes2WordCloud: a quick way to identify biological themes from gene lists and free text , 2011, Source Code for Biology and Medicine.

[138]  Majnu John,et al.  Graph analysis of structural brain networks in Alzheimer’s disease: beyond small world properties , 2016, Brain Structure and Function.

[139]  A. Azmi,et al.  Development of Precision Small-Molecule Proneurotrophic Therapies for Neurodegenerative Diseases. , 2017, Vitamins and hormones.

[140]  Li M Fu,et al.  Analysis of Parkinson's disease pathophysiology using an integrated genomics-bioinformatics approach. , 2015, Pathophysiology : the official journal of the International Society for Pathophysiology.

[141]  Jimeng Sun,et al.  Building bridges across electronic health record systems through inferred phenotypic topics , 2015, J. Biomed. Informatics.

[142]  Christos Davatzikos,et al.  Individualized statistical learning from medical image databases: Application to identification of brain lesions , 2014, Medical Image Anal..

[143]  Andrés Ortiz,et al.  Ensembles of Deep Learning Architectures for the Early Diagnosis of the Alzheimer's Disease , 2016, Int. J. Neural Syst..

[144]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[145]  Marek Ostaszewski,et al.  Integration and Visualization of Translational Medicine Data for Better Understanding of Human Diseases , 2016, Big Data.

[146]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[147]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[148]  Stuart Maudsley,et al.  β-arrestin-selective G protein-coupled receptor agonists engender unique biological efficacy in vivo. , 2013, Molecular endocrinology.