Cancer Breakpoint Hotspots Versus Individual Breakpoints Prediction by Machine Learning Models

Genome rearrangement is a hallmark of all cancers. Cancer breakpoint prediction appeared to be a difficult task, and various machine learning models did not achieve high prediction power. We investigated the power of machine learning models to predict breakpoint hotspots selected with different density thresholds and also compared prediction of hotspots versus individual breakpoints. We found that hotspots are considerably better predicted than individual breakpoints. While choosing a selection criterion, the test ROC AUC only is not enough to choose the best model, the lift of recall and lift of precision should be taken into consideration. Investigation of the lift of recall and lift of precision showed that it is impossible to select one criterion of hotspot selection for all cancer types but there are three to four distinct groups of cancer with similar properties. Overall the presented results point to the necessity to choose different hotspots selection criteria for different types of cancer.

[1]  L. Loeb,et al.  Mutational heterogeneity in human cancers: origin and consequences. , 2010, Annual review of pathology.

[2]  Kun Zhang,et al.  Cancer Genome Atlas Pan-cancer Analysis Project , 2015 .

[3]  A. Fujimoto,et al.  Cancer whole-genome sequencing: present and future , 2015, Oncogene.

[4]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[5]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[6]  Nuno A. Fonseca,et al.  Patterns of somatic structural variation in human cancer genomes , 2020, Nature.

[7]  S. Raghavan,et al.  Snaps and mends: DNA breaks and chromosomal translocations , 2015, The FEBS journal.

[8]  Paz Polak,et al.  Cell-of-origin chromatin organization shapes the mutational landscape of cancer , 2015, Nature.

[9]  Sandro Morganella,et al.  Noncanonical secondary structures arising from non-B DNA motifs are determinants of mutagenesis. , 2018, Genome research.

[10]  Stefan Schoenfelder,et al.  Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours , 2017, Genome Biology.

[11]  H. Nakagawa,et al.  Whole genome sequencing analysis for cancer genomics and precision medicine , 2018, Cancer science.

[12]  M. Poptsova,et al.  Tissue-specific impact of stem-loops and quadruplexes on cancer breakpoints formation , 2019, BMC Cancer.

[13]  Steven J. M. Jones,et al.  Pan-cancer analysis of whole genomes , 2020, Nature.