Choosing panels of genomics assays using submodular optimization

Due to the high cost of sequencing-based genomics assays such as ChIP-seq and DNase-seq, the epigenomic characterization of a cell type is typically carried out using a small panel of assay types. Deciding a priori which assays to perform is, thus, a critical step in many studies. We present the submodular selection of assays (SSA), a method for choosing a diverse panel of genomic assays that leverages methods from submodular optimization. More generally, this application serves as a model for how submodular optimization can be applied to other discrete problems in biology.

[1]  George L. Nemhauser,et al.  The uncapacitated facility location problem , 1990 .

[2]  Martin Grötschel,et al.  Mathematical Programming The State of the Art, XIth International Symposium on Mathematical Programming, Bonn, Germany, August 23-27, 1982 , 1983, ISMP.

[3]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[4]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[5]  Clifford A. Meyer,et al.  Model-based Analysis of ChIP-Seq (MACS) , 2008, Genome Biology.

[6]  Guillaume J. Filion,et al.  Systematic Protein Location Mapping Reveals Five Principal Chromatin Types in Drosophila Cells , 2010, Cell.

[7]  William Stafford Noble,et al.  Unsupervised segmentation of continuous genomic data , 2007, Bioinform..

[8]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[9]  Brigitte Maier,et al.  Supermodularity And Complementarity , 2016 .

[10]  L. Shapley Cores of convex games , 1971 .

[11]  Manolis Kellis,et al.  Discovery and characterization of chromatin states for systematic annotation of the human genome , 2010, Nature Biotechnology.

[12]  William Stafford Noble,et al.  Unsupervised pattern discovery in human chromatin structure through genomic segmentation , 2012, Nature Methods.

[13]  Andreas Krause,et al.  Discriminative Clustering by Regularized Information Maximization , 2010, NIPS.

[14]  William Stafford Noble,et al.  Identification of higher-order functional domains in the human ENCODE regions. , 2007, Genome research.

[15]  Xiaohui Xie,et al.  Discovering and mapping chromatin states using a tree hidden Markov model , 2013, BMC Bioinformatics.

[16]  Bing Li,et al.  The Role of Chromatin during Transcription , 2007, Cell.

[17]  Lovelace J. Luquette,et al.  Comprehensive analysis of the chromatin landscape in Drosophila , 2010, Nature.

[18]  Jeff A. Bilmes,et al.  Submodular subset selection for large-scale speech training data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  藤重 悟 Submodular functions and optimization , 1991 .

[20]  Michael Litt,et al.  The insulation of genes from external enhancers and silencing chromatin , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[22]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[23]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[24]  Mark Craven,et al.  Markov Networks for Detecting Overalpping Elements in Sequence Data , 2004, NIPS.

[25]  Michel Minoux,et al.  Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[26]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[27]  Andreas Krause,et al.  Distributed Submodular Maximization: Identifying Representative Elements in Massive Data , 2013, NIPS.

[28]  H. Narayanan Chapter 9 Submodular Functions , 1997 .

[29]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[30]  H. Narayanan Submodular functions and electrical networks , 1997 .

[31]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[32]  Manolis Kellis,et al.  Large-scale epigenome imputation improves data quality and disease variant enrichment , 2015, Nature Biotechnology.

[33]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.

[34]  M. Anderson,et al.  CoREST: a functional corepressor required for regulation of neural-specific gene expression. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Joseph Naor,et al.  A Tight Linear Time (1/2)-Approximation for Unconstrained Submodular Maximization , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[36]  Joseph Naor,et al.  Submodular Maximization with Cardinality Constraints , 2014, SODA.

[37]  Amos Tanay,et al.  Spatial Clustering of Multivariate Genomic and Epigenomic Information , 2009, RECOMB.

[38]  T. Mikkelsen,et al.  The NIH Roadmap Epigenomics Mapping Consortium , 2010, Nature Biotechnology.

[39]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[40]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[41]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[42]  Jeff A. Bilmes,et al.  Submodular feature selection for high-dimensional acoustic score spaces , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[43]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[44]  William Stafford Noble,et al.  Automated mapping of large-scale chromatin structure in ENCODE , 2008, Bioinform..

[45]  William Stafford Noble,et al.  Integrative annotation of chromatin elements from ENCODE data , 2012, Nucleic acids research.

[46]  William Stafford Noble,et al.  Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression , 2014, bioRxiv.

[47]  Michael Q. Zhang,et al.  Integrative analysis of 111 reference human epigenomes , 2015, Nature.

[48]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[49]  John Quackenbush,et al.  A tiered hidden Markov model characterizes multi-scale chromatin states. , 2013, Genomics.

[50]  X. Vives Oligopoly Pricing: Old Ideas and New Tools , 1999 .

[51]  J. Barkley Rosser,et al.  ON THE FOUNDATIONS OF MATHEMATICAL ECONOMICS , 2012 .