Computational prediction of gene structure and regulation in the genome of Drosophila melanogaster

Again… Felix qui potuit rerum cognoscere causas. This thesis is dedicated to the memory of my grandmother Maria Reese from Eichenberg.

[1]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[2]  J. C. Shepherd Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Borodovsky,et al.  GeneMark.hmm: new solutions for gene finding. , 1998, Nucleic acids research.

[4]  G. Akusjärvi,et al.  Gene expression, regulation of , 1995 .

[5]  Steven Salzberg,et al.  Finding Genes in DNA with a Hidden Markov Model , 1997, J. Comput. Biol..

[6]  C Venclovas,et al.  Processing and analysis of CASP3 protein structure predictions , 1999, Proteins.

[7]  J. Fickett,et al.  Eukaryotic promoter recognition. , 1997, Genome research.

[8]  W A Koppensteiner,et al.  An attempt to analyse progress in fold recognition from CASP1 to CASP3 , 1999, Proteins.

[9]  A. D. McLachlan,et al.  A method for measuring the non-random bias of a codon usage table. , 1984, Nucleic acids research.

[10]  E. Snyder,et al.  Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. , 1993, Nucleic acids research.

[11]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[12]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[13]  R. Tjian,et al.  Three-dimensional structure of the human TFIID-IIA-IIB complex. , 1999, Science.

[14]  J. Fickett,et al.  Identification of regulatory regions which confer muscle-specific gene expression. , 1998, Journal of molecular biology.

[15]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[16]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[17]  R. Conaway,et al.  General initiation factors for RNA polymerase II. , 1993, Annual review of biochemistry.

[18]  M. Green,et al.  Biochemical mechanisms of constitutive and regulated pre-mRNA splicing. , 1991, Annual review of cell biology.

[19]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[20]  R. Roeder,et al.  The role of general initiation factors in transcription by RNA polymerase II. , 1996, Trends in biochemical sciences.

[21]  R. Guigó,et al.  Computational gene identification , 1997, Journal of Molecular Medicine.

[22]  Juri Rappsilber,et al.  Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex , 1998, Nature Genetics.

[23]  S. Bryant,et al.  Critical assessment of methods of protein structure prediction (CASP): Round II , 1997, Proteins.

[24]  Elmar Nöth,et al.  Interpolated markov chains for eukaryotic promoter recognition , 1999, Bioinform..

[25]  Geoffrey E. Hinton,et al.  A time-delay neural network architecture for isolated word recognition , 1990, Neural Networks.

[26]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[27]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[28]  M Levitt,et al.  Competitive assessment of protein fold recognition and alignment accuracy , 1997, Proteins.

[29]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[30]  Anders Gorm Pedersen,et al.  Investigations of Escherichia coli Promoter Sequences with Artificial Neural Networks: New Signals Discovered Upstream of the Transcriptional Startpoint , 1995, ISMB.

[31]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[32]  P. Bucher Weight matrix descriptions of four eukaryotic RNA polymerase II promoter elements derived from 502 unrelated promoter sequences. , 1990, Journal of molecular biology.

[33]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[34]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[35]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.

[36]  D. Sankoff Efficient optimal decomposition of a sequence into disjoint regions, each matched to some template in an inventory. , 1992, Mathematical biosciences.

[37]  F E Penotti,et al.  Human DNA TATA boxes and transcription initiation sites. A statistical study. , 1990, Journal of molecular biology.

[38]  Jean-Michel Claverie,et al.  Detection of Eukaryotic Promoters Using Markov Transition Matrices , 1997, Comput. Chem..

[39]  Thomas Werner,et al.  Muscle actin genes: A first step towards computational classification of tissue specific promoters , 1998, Silico Biol..

[40]  Anders Krogh,et al.  Two Methods for Improving Performance of a HMM and their Application for Gene Finding , 1997, ISMB.

[41]  S Brunak,et al.  Analysis of eukaryotic promoter sequences reveals a systematically occurring CT-signal. , 1995, Nucleic acids research.

[42]  D. S. Prestridge Predicting Pol II promoter sequences using transcription factor binding sites. , 1995, Journal of molecular biology.

[43]  R. Linsker,et al.  A measure of DNA periodicity. , 1986, Journal of theoretical biology.

[44]  Alexander E. Kel,et al.  Eukaryotic promoter recognition by binding sites for transcription factors , 1995, Comput. Appl. Biosci..

[45]  Temple F. Smith,et al.  Prediction of gene structure. , 1992, Journal of molecular biology.

[46]  M. O'Neill,et al.  Training back-propagation neural networks to define and detect DNA-binding sites. , 1991, Nucleic acids research.

[47]  M. Gelfand,et al.  Prediction of the exon-intron structure by a dynamic programming approach. , 1993, Bio Systems.

[48]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[49]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[50]  B. Pugh,et al.  Mechanisms of transcription complex assembly. , 1996, Current opinion in cell biology.

[51]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[52]  Thomas Werner,et al.  Functional promoter modules can be detected by formal models independent of overall nucleotide sequence similarity , 1999, Bioinform..

[53]  Sean R. Eddy,et al.  Pfam: multiple sequence alignments and HMM-profiles of protein domains , 1998, Nucleic Acids Res..

[54]  M H Skolnick,et al.  A probabilistic model for detecting coding regions in DNA sequences. , 1994, IMA journal of mathematics applied in medicine and biology.

[55]  P Bucher,et al.  CCAAT box revisited: bidirectionality, location and context. , 1988, Journal of biomolecular structure & dynamics.

[56]  P Chambon,et al.  Organization and expression of eucaryotic split genes coding for proteins. , 1981, Annual review of biochemistry.

[57]  A. O'Shea-Greenfield,et al.  Roles of TATA and initiator elements in determining the start site location and direction of RNA polymerase II transcription. , 1992, The Journal of biological chemistry.

[58]  Chris A. Fields,et al.  gm: a practical tool for automating DNA sequence analysis , 1990, Comput. Appl. Biosci..

[59]  Ying Xu,et al.  Detection of RNA Polymerase II Promoters and Polyadenylation Sites in Human DNA Sequence , 1996, Comput. Chem..

[60]  Steen Knudsen,et al.  Promoter2.0: for the recognition of PolII promoter sequences , 1999, Bioinform..

[61]  J. Murphy,et al.  ALTERATIONS OF GENETIC MATERIAL FOR ANALYSIS OF ALCOHOL DEHYDROGENASE ISOZYMES OF DROSOPHILA MELANOGASTER * , 1968, Annals of the New York Academy of Sciences.

[62]  Phillip A. Sharp,et al.  13 Splicing of Precursors to mRNA by the Spliceosome , 1993 .

[63]  S. Smale,et al.  Transcription initiation from TATA-less promoters within eukaryotic protein-coding genes. , 1997, Biochimica et biophysica acta.

[64]  E. Trifonov,et al.  The pitch of chromatin DNA is reflected in its nucleotide sequence. , 1980, Proceedings of the National Academy of Sciences of the United States of America.

[65]  I E Auger,et al.  Algorithms for the optimal identification of segment neighborhoods. , 1989, Bulletin of mathematical biology.

[66]  M. Roberti,et al.  Identification of human GC‐box‐binding zinc finger protein, a new Krüppel‐like zinc finger protein, by the yeast one‐hybrid screening with a GC‐rich target sequence , 1999, FEBS letters.

[67]  David Haussler,et al.  Improved splice site detection in Genie , 1997, RECOMB '97.

[68]  C J Michel,et al.  New statistical approach to discriminate between protein coding and non-coding regions in DNA sequences and its evaluation. , 1986, Journal of theoretical biology.

[69]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[70]  S Brunak,et al.  Distance distributions in proteins: a six-parameter representation. , 1996, Protein engineering.

[71]  David Haussler,et al.  A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA , 1996, ISMB.

[72]  Christopher B. Burge,et al.  Classification of Introns: U2-Type or U12-Type , 1997, Cell.

[73]  R. Tjian,et al.  An interplay between TATA box-binding protein and transcription factors IIE and IIA modulates DNA binding and transcription. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[74]  S K Burley,et al.  Biochemistry and structural biology of transcription factor IID (TFIID). , 1996, Annual review of biochemistry.

[75]  Ying Xu,et al.  Constructing gene models from accurately predicted exons: an application of dynamic programming , 1994, Comput. Appl. Biosci..

[76]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP): Round III , 1999, Proteins.

[77]  M Kanehisa,et al.  An assessment of neural network and statistical approaches for prediction of E. coli promoter sites. , 1992, Nucleic acids research.

[78]  R. Guigó,et al.  Evaluation of gene structure prediction programs. , 1996, Genomics.

[79]  R George,et al.  An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region. , 1999, Genetics.

[80]  R. Kohler, Lords of the fly: Drosophila genetics and the experimental life. , 1995 .

[81]  E. Trifonov Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16 S rRNA nucleotide sequences. , 1987, Journal of molecular biology.

[82]  S. Karlin,et al.  Finding the genes in genomic DNA. , 1998, Current opinion in structural biology.

[83]  R. Kraus,et al.  Functional binding of the "TATA" box binding component of transcription factor TFIID to the -30 region of TATA-less promoters. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[84]  M S Gelfand,et al.  Computer prediction of the exon-intron structure of mammalian pre-mRNAs. , 1990, Nucleic acids research.

[85]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[86]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[87]  Jean-Michel Claverie,et al.  Heuristic informational analysis of sequences , 1986, Nucleic Acids Res..

[88]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[89]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[90]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[91]  Philipp Bucher,et al.  The Eukaryotic Promoter Database (EPD) , 2000, Nucleic Acids Res..

[92]  R. Kornberg,et al.  Eukaryotic transcriptional control. , 1999, Trends in cell biology.

[93]  J. Fickett,et al.  Assessment of protein coding measures. , 1992, Nucleic acids research.

[94]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[95]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[96]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[97]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[98]  L. Baum,et al.  An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process , 1972 .

[99]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[100]  R. Tjian,et al.  Diverse transcriptional functions of the multisubunit eukaryotic TFIID complex. , 1992, The Journal of biological chemistry.

[101]  David Haussler,et al.  Optimally Parsing a Sequence into Different Classes Based on Multiple Types of Evidence , 1994, ISMB.

[102]  Yin Xu,et al.  An Improved System for Exon Recognition and Gene Modeling in Human DNA Sequence , 1994, ISMB.

[103]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[104]  A. D. McLachlan,et al.  Codon preference and its use in identifying protein coding regions in long DNA sequences , 1982, Nucleic Acids Res..

[105]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[106]  N. Harris,et al.  Genotator: a workbench for sequence annotation. , 1997, Genome research.

[107]  Roland L. Dunbrack,et al.  Meeting review: the Second meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP2), Asilomar, California, December 13-16, 1996. , 1997, Folding & design.

[108]  D. Baltimore,et al.  The “initiator” as a transcription control element , 1989, Cell.

[109]  G. B. Hutchinson,et al.  The prediction of vertebrate promoter regions using differential hexamer frequency analysis , 1996, Comput. Appl. Biosci..

[110]  E. Snyder,et al.  Identification of protein coding regions in genomic DNA. , 1995, Journal of molecular biology.

[111]  P Bucher,et al.  Compilation and analysis of eukaryotic POL II promoter sequences. , 1986, Nucleic acids research.

[112]  David Haussler,et al.  Using Dirichlet Mixture Priors to Derive Hidden Markov Models for Protein Families , 1993, ISMB.

[113]  E. Uberbacher,et al.  Locating protein-coding regions in human DNA sequences by a multiple sensor-neural network approach. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[114]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[115]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[116]  Victor V. Solovyev,et al.  Identification of Human Gene Structure Using Linear Discriminant Functions and Dynamic Programming , 1995, ISMB.

[117]  D Haussler,et al.  Integrating database homology in a probabilistic gene structure model. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[118]  Michael C. O'Neill,et al.  Escherichia coli promoters: neural networks develop distinct descriptions in learning to search for promoters of different spacing classes , 1992, Nucleic Acids Res..

[119]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[120]  S Harbeck,et al.  Stochastic segment models of eukaryotic promoter regions. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[121]  Jacob V. Maizel,et al.  Discriminant analysis of promoter regions in Escherichia coli sequences , 1988, Comput. Appl. Biosci..

[122]  Victor V. Solovyev,et al.  The Gene-Finder Computer Tools for Analysis of Human and Model Organisms Genome Sequences , 1997, ISMB.