Deep learning for predicting disease status using genomic data

Predicting disease status for a complex human disease using genomic data is an important, yet challenging, step in personalized medicine. Among many challenges, the so-called curse of dimensionality problem results in unsatisfied performances of many state-of-art machine learning algorithms. A major recent advance in machine learning is the rapid development of deep learning algorithms that can efficiently extract meaningful features from high-dimensional and complex datasets through a stacked and hierarchical learning process. Deep learning has shown breakthrough performance in several areas including image recognition, natural language processing, and speech recognition. However, the performance of deep learning in predicting disease status using genomic datasets is still not well studied. In this article, we performed a review on the four relevant articles that we found through our thorough literature review. All four articles used autoencoders to project high-dimensional genomic data to a low dimensional space and then applied the state-of-the-art machine learning algorithms to predict disease status based on the low-dimensional representations. This deep learning approach outperformed existing prediction approaches, such as prediction based on probe-wise screening and prediction based on principal component analysis. The limitations of the current deep learning approach and possible improvements were also discussed.

[1]  Roderick Murray-Smith,et al.  Deep learning for real-time single-pixel video , 2018, Scientific Reports.

[2]  Yan Cui,et al.  Layerwise feature selection in Stacked Sparse Auto-Encoder for tumor type prediction , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[4]  Christophe Lemetre,et al.  MicroRNA signature analysis in colorectal cancer: identification of expression profiles in stage II tumors associated with aggressive disease , 2011, International Journal of Colorectal Disease.

[5]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[6]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[7]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[8]  David C. Whitcomb,et al.  What is personalized medicine and what should it replace? , 2012, Nature Reviews Gastroenterology &Hepatology.

[9]  R. Do,et al.  Using Full Genomic Information to Predict Disease: Breaking Down the Barriers Between Complex and Mendelian Diseases. , 2018, Annual review of genomics and human genetics.

[10]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[11]  Constantin F. Aliferis,et al.  Machine Learning Models for Classification of Lung Cancer and Selection of Genomic Markers Using Array Gene Expression Data , 2003, FLAIRS.

[12]  Kenta Nakai,et al.  Stable feature selection based on the ensemble L1-norm support vector machine for biomarker discovery , 2016, BMC Genomics.

[13]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[14]  Aman Gupta,et al.  Learning structure in gene expression data using deep architectures, with an application to gene clustering , 2015 .

[15]  Peter Kokol,et al.  Stability of Ranked Gene Lists in Large Microarray Analysis Studies , 2010, Journal of biomedicine & biotechnology.

[16]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[17]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[20]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[22]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[23]  Pooja Gupta,et al.  Using deep learning to enhance head and neck cancer diagnosis and classification , 2018, 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA).

[24]  Frauke Degenhardt,et al.  Evaluation of variable selection methods for random forests and omics data sets , 2017, Briefings Bioinform..

[25]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[26]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[27]  Paul J. Kennedy,et al.  The curse of dimensionality: a blessing to personalized medicine. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  Teh Ying Wah,et al.  Automated Diagnosis of Coronary Artery Disease: A Review and Workflow , 2018, Cardiology research and practice.

[29]  Loris Nanni,et al.  Combining multiple approaches for gene microarray classification , 2012, Bioinform..

[30]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..