A non-parametric approach for estimating stromal contamination in cancer samples

Recent advances in DNA microarrays technology provides detailed information on genomic aberrations in tumor cells. DNA copy number changes and loss-of-heterozygosity (LOH) are types of genomic aberrations which are identified using SNP arrays. The heterogeneity of clinical tumor tissue severely affects copy number analysis where tumor tissue has a large proportion of normal stromal cells. This may lead to the failure of the algorithms which are used to detect aberrations in the tumor cells. In this paper we introduce a statistical non-parametric approach to estimate the normal tissue contamination in tumor samples and then recover the true copy number profile in cancer cells. The proposed method is tested using large number of simulation datasets and one real dataset. The experimental results show the accuracy and robustness of the proposed method. We believe this tool will be very useful for people working with copy number analysis of heterogeneous tissues.