On the Informativeness of the DNA Promoter Sequences Domain Theory

The DNA promoter sequences domain theory and database have become popular for testing systems that integrate empirical and analytical learning. This note reports a simple change and reinterpretation of the domain theory in terms of M-of-N concepts, involving no learning, that results in an accuracy of 93.4% on the 106 items of the database. Moreover, an exhaustive search of the space of M-of-N domain theory interpretations indicates that the expected accuracy of a randomly chosen interpretation is 76.5%, and that a maximum accuracy of 97.2% is achieved in 12 cases. This demonstrates the informativeness of the domain theory, without the complications of understanding the interactions between various learning algorithms and the theory. In addition, our results help characterize the difficulty of learning using the DNA promoters theory.

[1]  H. Margalit,et al.  Compilation of E. coli mRNA promoter sequences. , 1993, Nucleic acids research.

[2]  Jude Shavlik,et al.  Using neural networks to refine existing biological knowledge , 1992 .

[3]  Steven W. Norton,et al.  Learning to Recognize Promoter Sequences in E. coli by Modeling Uncertainty in the Training Data , 1994, AAAI.

[4]  D. K. Hawley,et al.  Compilation and analysis of Escherichia coli promoter DNA sequences. , 1983, Nucleic acids research.

[5]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[6]  Julio Ortega,et al.  Making the Most of What You've Got: Using Models and Data to Improve Learning Rate and Prediction Accuracy , 1994, AAAI.

[7]  Raymond J. Mooney,et al.  Comparing Methods for Refining Certainty-Factor Rule-Bases , 1994, ICML.

[8]  J. Shavlik,et al.  Re nement of Approximate Domain Theories byKnowledge-Based Neural Networks , 1990 .

[9]  Raymond J. Mooney,et al.  Combining Connectionist and Symbolic Learning to Refine Certainty Factor Rule Bases , 1993 .

[10]  MethodsJ. R. QuinlanBasser Comparing Connectionist and Symbolic Learning , 1994 .

[11]  M. O'Neill,et al.  Escherichia coli promoters. II. A spacing class-dependent promoter search protocol. , 1989, The Journal of biological chemistry.

[12]  Ronen Feldman,et al.  Getting the Most from Flawed Theories , 1994, ICML.

[13]  Raymond J. Mooney,et al.  Symbolic Revision of Theories with M-of-N Rules , 1993, IJCAI.

[14]  David W. Opitz,et al.  Using Genetic Search to Refine Knowledge-based Neural Networks , 1994, ICML.

[15]  C. Harley,et al.  Analysis of E. coli promoter sequences. , 1987, Nucleic acids research.

[16]  D. Ourston Using explanation-based and empirical methods in theory revision , 1991 .

[17]  M. O'Neill Escherichia coli promoters. I. Consensus as it relates to spacing class, specificity, repeat substructure, and three-dimensional organization. , 1989, The Journal of biological chemistry.

[18]  Ronen Feldman,et al.  Bias-Driven Revision of Logical Domain Theories , 1993, J. Artif. Intell. Res..

[19]  Haym Hirsh,et al.  Bootstrapping Training-Data Representations for Inductive Learning: A Case Study in Molecular Biology , 1994, AAAI.

[20]  Geoffrey G. Towell,et al.  Symbolic knowledge and neural networks: insertion, refinement and extraction , 1992 .