Prediction of Primary Tumors in Cancers of Unknown Primary

Abstract A cancer of unknown primary (CUP) is a metastatic cancer for which standard diagnostic tests fail to identify the location of the primary tumor. CUPs account for 3–5% of cancer cases. Using molecular data to determine the location of the primary tumor in such cases can help doctors make the right treatment choice and thus improve the clinical outcome. In this paper, we present a new method for predicting the location of the primary tumor using gene expression data: locating cancers of unknown primary (LoCUP). The method models the data as a mixture of normal and tumor cells and thus allows correct classification even in impure samples, where the tumor biopsy is contaminated by a large fraction of normal cells. We find that our method provides a significant increase in classification accuracy (95.8% over 90.8%) on simulated low-purity metastatic samples and shows potential on a small dataset of real metastasis samples with known origin.