Combining Experimental Evidences from Replicates and Nearby Species Data for Annotating Novel Genomes

For several years now, there has been an exponential growth of the amount of life science data (e.g., sequenced complete genomes, 3D structures, DNA chips, Mass spectroscopy data) generated by high throughput experiments. Carrying out analyses of complex, voluminous, and heterogeneous data and guiding the analysis of data using a statistical and mathematical sound methodology is thus of paramount importance. Here we make and justify the observation that experimental replicates and phylogenetic data may be combined to strength the evidences on identifying transcriptional motifs, which seems to be quite difficult using other currently used methods. We present a case study considering sequences and microarray data from fungi species. Although we show that our methodology can result of immediate practical utility to bioinformaticians and biologists for annotating new genomes, here the focus is also on discussing the dependent interesting mathematical problems that high throughput data integration poses.