A Genomic Data Fusion Framework to Exploit Rare and Common Variants for Association Discovery

Collapsing methods are used in association studies to exploit the effect of genetic rare variants in diseases. In this work we model an enriched collapsing approach by including genes, protein domains, pathways and protein-protein interactions data. We applied the collapsing technique to a data set of epileptic (85 cases) and healthy (61 controls) subjects. The method retrieved 4 genes, 5 domains, 33 gene interactions and 14 pathways showing a significant association with the disease. Collapsed data have been also used as features for prediction models. We found that the use of protein-protein interactions as model features increases the area under ROC curve (+1.5%) if compared to the solely gene-based approach.