OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers

Summary The identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes. We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method is able to identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. We show that OncodriveCLUSTL may be applied to the analysis of non-coding genomic elements and non-human mutations data. Availability and implementation OncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public License. Contact nuria.lopez@irbbarcelona.org

[1]  M. Stratton,et al.  Short inverted repeats contribute to localized mutability in human somatic cells , 2017, Nucleic acids research.

[2]  C. Cole,et al.  The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers , 2018, Nature Reviews Cancer.

[3]  Radhakrishnan Sabarinathan,et al.  Nucleotide excision repair is impaired by binding of transcription factors to DNA , 2015, Nature.

[4]  Gary D Bader,et al.  Comprehensive identification of mutational cancer driver genes across 12 tumor types , 2013, Scientific Reports.

[5]  E. Larsson,et al.  Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types , 2014, Nature Genetics.

[6]  Tim F. Rayner,et al.  Mutational landscape of a chemically-induced mouse model of liver cancer , 2018, bioRxiv.

[7]  A. Godzik,et al.  Comparison of algorithms for the detection of cancer drivers at subgene resolution , 2017, Nature Methods.

[8]  David T. W. Jones,et al.  Signatures of mutational processes in human cancer , 2013, Nature.

[9]  N. Socci,et al.  Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity , 2015, Nature Biotechnology.

[10]  A. Gonzalez-Perez,et al.  OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations , 2016, Genome Biology.

[11]  J. Stamatoyannopoulos,et al.  Human mutation rate associated with DNA replication timing , 2009, Nature Genetics.

[12]  David L. Masica,et al.  Exome-Scale Discovery of Hotspot Mutation Regions in Human Cancer Using 3D Protein Structure. , 2016, Cancer research.

[13]  M. Stratton,et al.  Universal Patterns of Selection in Cancer and Somatic Tissues , 2018, Cell.

[14]  David Tamborero,et al.  OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes , 2013, Bioinform..

[15]  Steven A. Roberts,et al.  Mutational heterogeneity in cancer and the search for new cancer-associated genes , 2013 .