pint: Pairwise integration of functional genomics data

Multiple types of genomic observations from the same patients are increasingly available in biomedical studies, including measurements of geneand miRNA expression levels, gene copy number, and methylation status. By investigating the dependencies between these data sets it is possible to discover functional mechanisms and interactions that are not seen in the individual data sets. For example, integration of gene expression and copy number has been shown to reveal cancer-associated chromosomal regions and associated genes with potential diagnostic, prognostic and clinical impact [4]. We demonstrate how to integrate gene or micro-RNA expression with DNA copy number (aCGH) measurements to discover functionally active chromosomal aberrations. The models capture the shared signal in paired observations, and indicate the affected genes and patients. The methods are potentially applicable also to other types of biomedical data, including methylation, SNPs, alternative splicing and transcription factor binding, or in other application fields. The package provides general-purpose tools for the discovery and analysis of statistical dependencies between co-occurring data sources. The methods are based on a principled framework, probabilistic canonical correlation analysis [2] and its extensions [1, 3, 4]. Probabilistic formulation deals rigorously with uncertainty associated with small sample sizes common in biomedical studies, and the package also provides tools to guide dependency modeling through Bayesian priors [4].