ILP Challenge 2005: The Safarii MRDM environment

In this paper, we provide some preliminary result for the ILP Challenge 2005 concerning a genetic database of information related to the function of a range of yeast genes. The yeast database consists of multiple tables, and is hence a multi-relational problem. We demonstrate how the MRDM packages Safarii (mining) and ProSafarii (pre-processing) can be used to mine this data. We provide biological justification for the results obtained. This paper considers the genetic database of genes that make up the yeast genome Saccharomyces cerevisiae, which is provided as a challenge for data miners in the context of the ILP 2005 conference (2). This database contains descriptions of individual genes, and lots of background information such as the homology between pairs of yeast genes, secondary structure information and homology with different genes that appear in a database known as SwissProt. The database thus describes structured information, which makes the analysis multi-relational. The data is spread over a total of 11 tables. In this paper we describe a Data Mining exercise based on the Multi-Relational Data Mining framework implemented in the Safarii package, developed by the first two authors. Safarii provides a number of algorithms that work on multi-relational data stored in a relational database. The mining package is supported by a pre- processing tool known as ProSafarii. In the next section, we give an overview of how our MRDM framework and the two software packages work. Section 3 gives an overview of the structure of the yeast database. In Section 4 we provide a number of Data Mining settings that were tried, as well as the required pre-processing involved. Section 5 describes the results for the different settings. We give some biological interpretation of the results as well.

[1]  Hendrik Blockeel,et al.  Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.

[2]  Silvio C. E. Tosatto,et al.  The SSEA server for protein secondary structure alignment , 2005, Bioinform..