Using Multiple Levels of Learning and Diverse Evidence to Uncover Coordinately Controlled Genes

Now that the complete genomes of numerous organisms have been determined, a key problem in computational molecular biology is uncovering the relationships that exist among the genes in each organism and the regulatory mechanisms that control their operation. We are developing computational methods for discovering such regulatory mechanisms and relationships. Toward this end, we have developed a machine learning approach to identifying sets of genes that are coordinately controlled in the E. coli genome. A number of factors make this an interesting application for machine learning: (i) there is a rich variety of data types that provide useful evidence for this task, (ii) the overall problem of uncovering regulatory mechanisms can be decomposed in multiple machine learning subtasks operating at dieren t levels of detail, (iii) there are not any known negative training examples, and (iv) some of the features are misleading in their predictiveness.