Using the Multiple Instance Learning framework to address differential regulation

Cell differentiation is a natural process occurring in all higher organisms, since the early fetal stage of life. It is, also, a part of disease – such as cancer – as the cell cycle becomes deregulated and cells behave differently compared to healthy ones. Differentiation occurs although the genome of all cells is identical across all cell types of the same organism. The motivation behind the current work is to understand why this happens. Cells differentiate because of different gene expression patterns. The genomic features close or around a gene determine its expression. One of these genomic features is the binding of Transcription Factors (TFs), which are proteins that bind in the promoter region of genes and are responsible for their (non-) expression. Other genomic features in?uence the binding of TFs close to genes, such as the accessibility of DNA, the levels of DNA methylation or the modi?cation of histones. The purpose of this study is to identify the genomic features that in?uence the binding of the TFs that are responsible for gene expression. Normal classi?cation cannot express that multiple TFs need to bind in a gene’s promoter region for it to be expressed and the number of TFs varies among genes. The TF labels are also unknown, meaning that it is not known which TF, or TFs, is/are responsible for gene expression. For these reasons, this problem – and the data – ?ts the Multiple Instance Learning (MIL) framework. A method is formulated, where a gene is treated as a bag and all the TF binding sites are instances. The results are promising, as TFs that were selected as important for gene expression were found to be so in a biological example.

[1]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[2]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[3]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[4]  Yixin Chen,et al.  MILES: Multiple-Instance Learning via Embedded Instance Selection , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Boris Babenko Multiple Instance Learning: Algorithms and Applications , 2008 .

[6]  T. Blauwkamp,et al.  The chromatin remodelers ISWI and ACF1 directly repress Wingless transcriptional targets. , 2008, Developmental biology.

[7]  Richard James,et al.  2 Data Collection , 2008 .

[8]  T. Borodina,et al.  Transcriptome analysis by strand-specific sequencing of complementary DNA , 2009, Nucleic acids research.

[9]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[10]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[11]  Avi Ma'ayan,et al.  ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments , 2010, Bioinform..

[12]  Peter J. Bickel,et al.  Measuring reproducibility of high-throughput experiments , 2011, 1110.4705.

[13]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[14]  G. Åkerström,et al.  Hypermethylated in cancer 1 (HIC1), a tumor suppressor gene epigenetically deregulated in hyperparathyroid tumors by histone H3 lysine modification. , 2012, The Journal of clinical endocrinology and metabolism.

[15]  Kristian Helin,et al.  Molecular mechanisms and potential functions of histone demethylases , 2012, Nature Reviews Molecular Cell Biology.

[16]  D. Leprince,et al.  Hypermethylated in Cancer 1 (HIC1) Recruits Polycomb Repressive Complex 2 (PRC2) to a Subset of Its Target Genes through Interaction with Human Polycomb-like (hPCL) Proteins* , 2012, The Journal of Biological Chemistry.

[17]  A. Mortazavi,et al.  Technical considerations for functional sequencing assays , 2012, Nature Immunology.

[18]  Günter P. Wagner,et al.  Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples , 2012, Theory in Biosciences.

[19]  Marco Loog,et al.  Combining Instance Information to Classify Bags , 2013, MCS.

[20]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[21]  Hongdong Li,et al.  Systematically Differentiating Functions for Alternatively Spliced Isoforms through Integrating RNA-seq Data , 2013, PLoS Comput. Biol..

[22]  Yan Liu,et al.  High-resolution functional annotation of human transcriptome: predicting isoform functions by a novel multiple instance-based label propagation method , 2013, Nucleic acids research.