Building Decision Trees on Records Linked through Key References

We consider the classification problem where the data is given by a collection of tables related by a hierarchical structure of key references and class labels contained in the root table. Each parent table represents a many-to-many relationship type among its child tables. Such data are frequently found in relational databases, data warehouses, XML data, and biological databases. One solution is joining all tables into a universal table based on the recorded relationships, but it suffers from a significant blowup caused by many-to-many relationships. Another solution is treating the problem as relational learning, at the cost of increased complexity and degraded performance. We propose a novel method that builds exactly the same decision tree classifier as built from the joined table, but not the blowup required in the traditional approach.