Genomic transcription regulatory element location analysis via poisson weighted lasso

The distances between DNA Transcription Regulatory Elements (TRE) provide important clues to their dependencies and function within the gene regulation process. However, the locations of those TREs as well as their cross distances between occurrences are stochastic, in part due to the inherent limitations of Next Generation Sequencing methods used to localize them, in part due to biology itself. This paper describes a novel approach to analyzing these locations and their cross distances even at long range via a Poisson random convolution. The resulting deconvolution problem is ill-posed, and sparsity regularization is used to offset this challenge. Unlike previous work on sparse Poisson inverse problems, this paper adopts a weighted LASSO estimator with data-dependent weights calculated using concentration inequalities that account for the Poisson noise. This method exhibits better squared error performance than the classical (unweighted) LASSO both in theoretical performance bounds and in simulation studies, and can easily be computed using off-the-shelf LASSO solvers.

[1]  P. Reynaud-Bouret,et al.  Exponential Inequalities, with Constants, for U-statistics of Order Two , 2003 .

[2]  Jean-Luc Starck,et al.  Astronomical Data Analysis and Sparsity: From Wavelets to Compressed Sensing , 2009, Proceedings of the IEEE.

[3]  Xin Jiang,et al.  Minimax Optimal Rates for Poisson Inverse Problems With Physical Constraints , 2014, IEEE Transactions on Information Theory.

[4]  I. Rish,et al.  Sparse signal recovery with exponential-family noise , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  J CandesEmmanuel,et al.  A Probabilistic and RIPless Theory of Compressed Sensing , 2011 .

[6]  Ole Winther,et al.  Multivariate Hawkes process models of the occurrence of regulatory elements , 2010, BMC Bioinformatics.

[7]  Rebecca Willett,et al.  Compressive coded aperture superresolution image reconstruction , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Roummel F. Marcia,et al.  Compressed Sensing Performance Bounds Under Poisson Noise , 2009, IEEE Transactions on Signal Processing.

[9]  Michael R. Green,et al.  Transcriptional regulatory elements in the human genome. , 2006, Annual review of genomics and human genetics.

[10]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[11]  Jeffrey A. Fessler,et al.  Sparsity regularization for image reconstruction with Poisson data , 2009, Electronic Imaging.

[12]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[13]  Roummel F. Marcia,et al.  Spatio-temporal Compressed Sensing with Coded Apertures and Keyed Exposures , 2011, 1111.7247.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[16]  Laure Sansonnet,et al.  Wavelet Thresholding Estimation in a Poissonian Interactions Model with Application to Genomic Data , 2011, 1107.4219.

[17]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.

[18]  A. Robert Calderbank,et al.  Performance Bounds for Expander-Based Compressed Sensing in Poisson Noise , 2010, IEEE Transactions on Signal Processing.

[19]  Venkatesh Saligrama,et al.  Minimax Optimal Sparse Signal Recovery With Poisson Statistics , 2015, IEEE Transactions on Signal Processing.

[20]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[21]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[22]  Sophie Schbath,et al.  FADO: A Statistical Method to Detect Favored or Avoided Distances between Occurrences of Motifs using the Hawkes' Model , 2005, Statistical applications in genetics and molecular biology.

[23]  S. Geer,et al.  The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso) , 2011 .

[24]  Justin K. Romberg,et al.  Compressive Sensing by Random Convolution , 2009, SIAM J. Imaging Sci..