A Toolbox for Learning from Relational Data with Propositional and Multi-instance Learners

Most databases employ the relational model for data storage To use this data in a propositional learner, a propositionalization step has to take place Similarly, the data has to be transformed to be amenable to a multi-instance learner The Proper Toolbox contains an extended version of RELAGGS, the Multi-Instance Learning Kit MILK, and can also combine the multi-instance data with aggregated data from RELAGGS RELAGGS was extended to handle arbitrarily nested relations and to work with both primary keys and indices For MILK the relational model is flattened into a single table and this data is fed into a multi-instance learner REMILK finally combines the aggregated data produced by RELAGGS and the multi-instance data, flattened for MILK, into a single table that is once again the input for a multi-instance learner Several well-known datasets are used for experiments which highlight the strengths and weaknesses of the different approaches.