Multiple types of genomic observations from the same patients are increasingly available in biomedical studies, including measurements of geneand miRNA expression levels, gene copy number, and methylation status. By investigating the dependencies between these data sets it is possible to discover functional mechanisms and interactions that are not seen in the individual data sets. For example, integration of gene expression and copy number has been shown to reveal cancer-associated chromosomal regions and associated genes with potential diagnostic, prognostic and clinical impact [4]. We demonstrate how to integrate gene or micro-RNA expression with DNA copy number (aCGH) measurements to discover functionally active chromosomal aberrations. The models capture the shared signal in paired observations, and indicate the affected genes and patients. The methods are potentially applicable also to other types of biomedical data, including methylation, SNPs, alternative splicing and transcription factor binding, or in other application fields. The package provides general-purpose tools for the discovery and analysis of statistical dependencies between co-occurring data sources. The methods are based on a principled framework, probabilistic canonical correlation analysis [2] and its extensions [1, 3, 4]. Probabilistic formulation deals rigorously with uncertainty associated with small sample sizes common in biomedical studies, and the package also provides tools to guide dependency modeling through Bayesian priors [4].
[1]
Michael I. Jordan,et al.
A Probabilistic Interpretation of Canonical Correlation Analysis
,
2005
.
[2]
S. Knuutila,et al.
Integrated gene copy number and expression microarray analysis of gastric cancer highlights potential target genes
,
2008,
International journal of cancer.
[3]
Samuel Kaski,et al.
Dependency detection with similarity constraints
,
2009,
2009 IEEE International Workshop on Machine Learning for Signal Processing.
[4]
Michel Verleysen,et al.
Robust probabilistic projections
,
2006,
ICML.
[5]
Samuel Kaski,et al.
Probabilistic approach to detecting dependencies between data sets
,
2008,
Neurocomputing.