Diet Networks: Thin Parameters for Fat Genomic

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

[1]  Ryan J. Haasl,et al.  Genetic ancestry inference using support vector machines, and the active emergence of a unique American population , 2012, European Journal of Human Genetics.

[2]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[3]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[4]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[5]  Yun S. Song,et al.  Deep Learning for Population Genetic Inference , 2015, bioRxiv.

[6]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[7]  David R. Kelley,et al.  Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks , 2015, bioRxiv.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[10]  Yoshua Bengio,et al.  Learning a synaptic learning rule , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[11]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[12]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[13]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[14]  Yi Li,et al.  Gene expression inference with deep learning , 2015, bioRxiv.

[15]  Brendan J. Frey,et al.  Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets , 2016, Proceedings of the IEEE.

[16]  Yanjun Qi,et al.  DeepChrome: deep-learning for predicting gene expression from histone modifications , 2016, Bioinform..

[17]  C. Bustamante,et al.  Inferring genome-wide patterns of admixture in Qataris using fifty-five ancestral populations , 2012, BMC Genetics.

[18]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[19]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .

[20]  Jürgen Schmidhuber,et al.  Evolving Modular Fast-Weight Networks for Control , 2005, ICANN.

[21]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[22]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[23]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[24]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[25]  Luc Van Gool,et al.  Dynamic Filter Networks , 2016, NIPS.