Genotype to Phenotype Mapping and the Fitness Landscape of the E. coli lac Promoter

Genotype-to-phenotype maps and the related fitness landscapes that include epistatic interactions are difficult to measure because of their high dimensional structure. Here we construct such a map using the recently collected corpora of high-throughput sequence data from the 75 base pairs long mutagenized E. coli lac promoter region, where each sequence is associated with its phenotype, the induced transcriptional activity measured by a fluorescent reporter. We find that the additive (non-epistatic) contributions of individual mutations account for about two-thirds of the explainable phenotype variance, while pairwise epistasis explains about 7% of the variance for the full mutagenized sequence and about 15% for the subsequence associated with protein binding sites. Surprisingly, there is no evidence for third order epistatic contributions, and our inferred fitness landscape is essentially single peaked, with a small amount of antagonistic epistasis. There is a significant selective pressure on the wild type, which we deduce to be multi-objective optimal for gene expression in environments with different nutrient sources. We identify transcription factor (CRP) and RNA polymerase binding sites in the promotor region and their interactions without difficult optimization steps. In particular, we observe evidence for previously unexplored genetic regulatory mechanisms, possibly kinetic in nature. We conclude with a cautionary note that inferred properties of fitness landscapes may be severely influenced by biases in the sequence data.

[1]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[2]  Michael J. Berry,et al.  Weak pairwise correlations imply strongly correlated network states in a neural population , 2005, Nature.

[3]  J. M. Hoekstra,et al.  The Strength of Phenotypic Selection in Natural Populations , 2001, The American Naturalist.

[4]  Johannes Berg,et al.  Adaptive evolution of transcription factor binding sites , 2003, BMC Evolutionary Biology.

[5]  W. Bialek,et al.  Maximum entropy models for antibody diversity , 2009, Proceedings of the National Academy of Sciences.

[6]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[7]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[8]  Sebastian Bonhoeffer,et al.  Exploring the Complexity of the HIV-1 Fitness Landscape , 2012, PLoS genetics.

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  David W Hall,et al.  Fitness epistasis among 6 biosynthetic loci in the budding yeast Saccharomyces cerevisiae. , 2010, The Journal of heredity.

[11]  Fangping Mu,et al.  Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites , 2010, PLoS Comput. Biol..

[12]  M. Lässig,et al.  Evolutionary population genetics of promoters: predicting binding sites and functional phylogenies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Adam A. Margolin,et al.  Multivariate dependence and genetic networks inference. , 2010, IET systems biology.

[14]  S. Leibler,et al.  Neuronal couplings between retinal ganglion cells inferred by efficient inverse statistical physics methods , 2009, Proceedings of the National Academy of Sciences.

[15]  Gary D Bader,et al.  Quantitative analysis of fitness and genetic interactions in yeast on a genome scale , 2010, Nature Methods.

[16]  Terence Hwa,et al.  Combinatorial transcriptional control of the lactose operon of Escherichia coli , 2007, Proceedings of the National Academy of Sciences.

[17]  R. Lenski,et al.  Negative Epistasis Between Beneficial Mutations in an Evolving Bacterial Population , 2011, Science.

[18]  Ben Hui Liu,et al.  Statistical Genomics: Linkage, Mapping, and QTL Analysis , 1997 .

[19]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[20]  Joachim Krug,et al.  Evolutionary Accessibility of Mutational Pathways , 2011, PLoS Comput. Biol..

[21]  D. Mosier,et al.  Fitness Epistasis and Constraints on Adaptation in a Human Immunodeficiency Virus Type 1 Protein Region , 2010, Genetics.

[22]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[23]  M. Lässig,et al.  Nonlinear Fitness Landscape of a Molecular Pathway , 2011, PLoS genetics.

[24]  Michael J. Berry,et al.  Ising models for networks of real neurons , 2006, q-bio/0611072.

[25]  Sebastian Bonhoeffer,et al.  A systems analysis of mutational effects in HIV-1 protease and reverse transcriptase , 2011, Nature Genetics.

[26]  Goldenfeld,et al.  Simple lessons from complexity , 1999, Science.

[27]  Nigel F. Delaney,et al.  Diminishing Returns Epistasis Among Beneficial Mutations Decelerates Adaptation , 2011, Science.

[28]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[29]  D. J. Kiviet,et al.  Empirical fitness landscapes reveal accessible evolutionary paths , 2007, Nature.

[30]  A. Ferré-D’Amaré,et al.  Rapid Construction of Empirical RNA Fitness Landscapes , 2010, Science.

[31]  Rob Phillips,et al.  Operator sequence alters gene expression independently of transcription factor occupancy in bacteria. , 2012, Cell reports.

[32]  D. J. Kiviet,et al.  Reciprocal sign epistasis is a necessary condition for multi-peaked fitness landscapes. , 2011, Journal of theoretical biology.

[33]  Stephen P. Miller,et al.  The Biochemical Architecture of an Ancient Adaptive Landscape , 2005, Science.

[34]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[35]  Ifije E. Ohiorhenuan,et al.  Sparse coding and high-order correlations in fine-scale cortical networks , 2010, Nature.

[36]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[37]  J. Krug,et al.  Quantitative analyses of empirical fitness landscapes , 2012, 1202.4378.

[38]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. II. The binding specificity of cyclic AMP receptor protein to recognition sites. , 1988, Journal of molecular biology.

[39]  S. Zhong,et al.  A Genome-Wide Association Study of Upper Aerodigestive Tract Cancers Conducted within the INHANCE Consortium , 2011, PLoS genetics.

[40]  Jason H. Moore,et al.  A global view of epistasis , 2005, Nature Genetics.

[41]  Rachel B. Brem,et al.  The landscape of genetic complexity across 5,700 gene expression traits in yeast. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Mariano J. Alvarez,et al.  Genome-wide Identification of Post-translational Modulators of Transcription Factor Activity in Human B-Cells , 2009, Nature Biotechnology.

[43]  C. Geyer,et al.  INFERRING FITNESS LANDSCAPES , 2010, Evolution; international journal of organic evolution.

[44]  Kenneth D. Miller,et al.  Adaptive filtering enhances information transmission in visual cortex , 2006, Nature.

[45]  J. Kinney,et al.  Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence , 2010, Proceedings of the National Academy of Sciences.

[46]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[47]  G. Church,et al.  Modular epistasis in yeast metabolism , 2005, Nature Genetics.

[48]  John M. Beggs,et al.  A Maximum Entropy Model Applied to Spatial and Temporal Correlations from Cortical Networks In Vitro , 2008, The Journal of Neuroscience.

[49]  N. Goldenfeld,et al.  Life is Physics: Evolution as a Collective Phenomenon Far From Equilibrium , 2010, 1011.4125.

[50]  Ville Mustonen,et al.  Energy-dependent fitness: A quantitative model for the evolution of yeast transcription factor binding sites , 2008, Proceedings of the National Academy of Sciences.

[51]  Nigel F. Delaney,et al.  Darwinian Evolution Can Follow Only Very Few Mutational Paths to Fitter Proteins , 2006, Science.

[52]  U. Alon,et al.  Optimality and evolutionary tuning of the expression level of a protein , 2005, Nature.

[53]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[54]  Gary D Bader,et al.  The Genetic Landscape of a Cell , 2010, Science.

[55]  Michael E. Wall,et al.  Model of Transcriptional Activation by MarA in Escherichia coli , 2009, PLoS Comput. Biol..

[56]  Terence Hwa,et al.  On the Selection and Evolution of Regulatory DNA Motifs , 2001, Journal of Molecular Evolution.

[57]  Anirvan M. Sengupta,et al.  A biophysical approach to transcription factor binding site discovery. , 2003, Genome research.

[58]  W. Greene,et al.  计量经济分析 = Econometric analysis , 2009 .