A greedy great approach to learn with complementary structured datasets

We are interested in problems where two hierarchically structured datasets may be complementary for a learning process. This case may arise in biological applications where genomic and metagenomic analyses may be collected for studying the genomic features of an organism along with its environnement. In this work, we propose a model to assess the relevant interactions between the two datasets. We use a compressed representation that of the original data to cope with the high dimensionality of the problem. We show that the collection of models, characterized through the hierarchical structures, forms a partially ordered set and take advantage of this organization to define a greedy approach to solve the problem more efficiently. Finally, we illustrate the behavior of the resulting algorithm on numerical simulations.