PhyliCS: a Python library to explore scCNA data and quantify spatial tumor heterogeneity

Background Tumors are composed by a number of cancer cell subpopulations (subclones), characterized by a distinguishable set of mutations. This phenomenon, known as intra-tumor heterogeneity (ITH), may be studied using Copy Number Aberrations (CNAs). Nowadays ITH can be assessed at the highest possible resolution using single-cell DNA (scDNA) sequencing technology. Additionally, single-cell CNA (scCNA) profiles from multiple samples of the same tumor can in principle be exploited to study the spatial distribution of subclones within a tumor mass. However, since the technology required to generate large scDNA sequencing datasets is relatively recent, dedicated analytical approaches are still lacking. Results We present PhyliCS, the first tool which exploits scCNA data from multiple samples from the same tumor to estimate whether the different clones of a tumor are well mixed or spatially separated. Starting from the CNA data produced with third party instruments, it computes a score, the Spatial Heterogeneity score, aimed at distinguishing spatially intermixed cell populations from spatially segregated ones. Additionally, it provides functionalities to facilitate scDNA analysis, such as feature selection and dimensionality reduction methods, visualization tools and a flexible clustering module. Conclusions PhyliCS represents a valuable instrument to explore the extent of spatial heterogeneity in multi-regional tumour sampling, exploiting the potential of scCNA data.

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[3]  L. Pusztai,et al.  Cancer heterogeneity: implications for targeted therapeutics , 2013, British Journal of Cancer.

[4]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[5]  Richard A. Moore,et al.  Clonal Decomposition and DNA Replication States Defined by Scaled Single-Cell Genome Sequencing , 2019, Cell.

[6]  Charles Swanton,et al.  Tumour heterogeneity and the evolution of polyclonal drug resistance , 2014, Molecular oncology.

[7]  Obi L. Griffith,et al.  SciClone: Inferring Clonal Architecture and Tracking the Spatial and Temporal Patterns of Tumor Evolution , 2014, PLoS Comput. Biol..

[8]  Hao Chen,et al.  DNA copy number profiling using single‐cell sequencing , 2018, Briefings Bioinform..

[9]  Mar'ia Rodr'iguez Mart'inez,et al.  Inferring clonal composition from multiple tumor biopsies , 2017, npj Systems Biology and Applications.

[10]  A. Shaw,et al.  Tumour heterogeneity and resistance to cancer therapies , 2018, Nature Reviews Clinical Oncology.

[11]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[12]  J. Troge,et al.  Inferring tumor progression from genomic heterogeneity. , 2010, Genome research.

[13]  Jan Schröder,et al.  Socrates: identification of genomic rearrangements in tumour genomes by re-aligning soft clipped reads , 2014, Bioinform..

[14]  Yao Xiao,et al.  FastClone is a probabilistic tool for deconvoluting tumor heterogeneity in bulk-sequencing samples , 2020, Nature Communications.

[15]  Leland McInnes,et al.  Accelerated Hierarchical Density Based Clustering , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[16]  J. Vijg,et al.  SCCNV: A Software Tool for Identifying Copy Number Variation From Single-Cell Whole-Genome Sequencing , 2019, bioRxiv.

[17]  Yong Wang,et al.  Single-cell DNA sequencing reveals a late-dissemination model in metastatic colorectal cancer , 2017, Genome research.

[18]  Benjamin J Raphael,et al.  Characterizing allele- and haplotype-specific copy numbers in single cells with CHISEL , 2020, Nature biotechnology.

[19]  Russell Schwartz,et al.  Deconvolution and phylogeny inference of structural variations in tumor genomic samples , 2018, bioRxiv.

[20]  Yuchao Jiang,et al.  SCOPE: a normalization and copy number estimation method for single-cell DNA sequencing , 2019, bioRxiv.

[21]  Shankar Vembu,et al.  PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors , 2015, Genome Biology.

[22]  Michael C. Schatz,et al.  Interactive analysis and assessment of single-cell copy-number variations , 2015, Nature Methods.

[23]  Jennifer M. Carr,et al.  Assessment of Tumor Heterogeneity, as Evidenced by Gene Expression Profiles, Pathway Activation, and Gene Copy Number, in Patients with Multifocal Invasive Lobular Breast Tumors , 2016, PloS one.

[24]  Benjamin J. Raphael,et al.  Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data , 2020, Nature communications.

[25]  Jin-Wu Nam,et al.  Measuring intratumor heterogeneity by network entropy using RNA-seq data , 2016, Scientific Reports.

[26]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[27]  Nicolai J. Birkbak,et al.  Tracking the Evolution of Non‐Small‐Cell Lung Cancer , 2017, The New England journal of medicine.

[28]  Samuel Aparicio,et al.  Scalable whole-genome single-cell library preparation without preamplification , 2017, Nature Methods.

[29]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[30]  Xiaosheng Wang,et al.  An algorithm to quantify intratumor heterogeneity based on alterations of gene expression profiles , 2020, Communications biology.

[31]  A. Bouchard-Côté,et al.  PyClone: statistical inference of clonal population structure in cancer , 2014, Nature Methods.

[32]  M. Ankerst,et al.  OPTICS: ordering points to identify the clustering structure , 1999, ACM SIGMOD Conference.

[33]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[34]  Jack Kuipers,et al.  Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data , 2019, Nature Communications.

[35]  Carlo C. Maley,et al.  Clonal evolution in cancer , 2012, Nature.

[36]  P. A. Futreal,et al.  Multiregion gene expression profiling reveals heterogeneity in molecular subtypes and immunotherapy response signatures in lung cancer , 2018, Modern Pathology.

[37]  M. Gerlinger,et al.  How Darwinian models inform therapeutic failure initiated by clonal heterogeneity in cancer medicine , 2010, British Journal of Cancer.

[38]  Charles Swanton,et al.  Intratumor Heterogeneity: Seeing the Wood for the Trees , 2012, Science Translational Medicine.

[39]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[40]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[41]  Jack Kuipers,et al.  Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data , 2017, Nature Communications.

[42]  J. Salk Clonal evolution in cancer , 2010 .

[43]  Effective Evaluation of Clustering Algorithms on Single-Cell CNA data , 2020 .

[44]  Hanlee P. Ji,et al.  Joint single cell DNA-seq and RNA-seq of gastric cancer cell lines reveals rules of in vitro evolution , 2020, NAR genomics and bioinformatics.

[45]  Olivier François,et al.  Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. , 2006, Systematic biology.

[46]  Knut Reinert,et al.  The SeqAn C++ template library for efficient sequence analysis: A resource for programmers. , 2017, Journal of biotechnology.

[47]  Y. Kluger,et al.  TrAp: a tree approach for fingerprinting subclonal tumor composition , 2013, Nucleic acids research.

[48]  Aaron R. Quinlan,et al.  Bioinformatics Applications Note Genome Analysis Bedtools: a Flexible Suite of Utilities for Comparing Genomic Features , 2022 .

[49]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[50]  Thomas O. McDonald,et al.  Breast Tumors Maintain a Reservoir of Subclonal Diversity During Expansion , 2021, Nature.

[51]  Nicholas Pervolarakis,et al.  Tumour heterogeneity and metastasis at single-cell resolution , 2018, Nature Cell Biology.

[52]  James D. Brenton,et al.  Phylogenetic Quantification of Intra-tumour Heterogeneity , 2013, PLoS Comput. Biol..

[53]  M. Cecchini,et al.  Ultrastructural Characterization of the Lower Motor System in a Mouse Model of Krabbe Disease , 2016, Scientific Reports.

[54]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[55]  Carissa A. Sanchez,et al.  Genetic clonal diversity predicts progression to esophageal adenocarcinoma , 2006, Nature Genetics.

[56]  Benjamin J. Raphael,et al.  Accurate quantification of copy-number aberrations and whole-genome duplications in multi-sample tumor sequencing data , 2018, Nature Communications.

[57]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[58]  Noemi Andor,et al.  Joint single cell DNA-Seq and RNA-Seq of gastric cancer reveals subclonal signatures of genomic instability and gene expression , 2018, bioRxiv.

[59]  Chao Di,et al.  U1 snRNP regulates cancer cell migration and invasion in vitro , 2020, Nature Communications.

[60]  Shankar Vembu,et al.  Inferring clonal evolution of tumors from single nucleotide somatic mutations , 2012, BMC Bioinformatics.

[61]  Hao Chen,et al.  Integrative pipeline for profiling DNA copy number and inferring tumor phylogeny , 2017, bioRxiv.

[62]  Joshua F. McMichael,et al.  Clonal evolution in relapsed acute myeloid leukemia revealed by whole genome sequencing , 2011, Nature.

[63]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[64]  Victor Guryev,et al.  Single-cell sequencing reveals karyotype heterogeneity in murine and human malignancies , 2016, Genome Biology.

[65]  L. Looijenga,et al.  TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors , 2018, PloS one.

[66]  Lincoln D. Stein,et al.  PhyloWGS: Reconstructing subclonal composition and evolution from whole-genome sequencing of tumors , 2014, Genome Biology.

[67]  Yuchao Jiang,et al.  SCOPE: A Normalization and Copy-Number Estimation Method for Single-Cell DNA Sequencing. , 2020, Cell systems.

[68]  Niko Beerenwinkel,et al.  BitPhylogeny: a probabilistic framework for reconstructing intra-tumor phylogenies , 2015, Genome Biology.

[69]  Luay Nakhleh,et al.  Assessing the performance of methods for copy number aberration detection from single-cell DNA sequencing data , 2020, PLoS Comput. Biol..

[70]  Jirimutu,et al.  Whole-genome sequencing of 128 camels across Asia reveals origin and migration of domestic Bactrian camels , 2020, Communications Biology.