To the Editor — There is a great need for easy-to-use cancer genomics visualization tools for both large public data resources such as TCGA (The Cancer Genome Atlas)1 and the GDC (Genomic Data Commons)2, as well as smaller-scale datasets generated by individual labs. Commonly used interactive visualization tools are either web-based portals or desktop applications. Data portals have a dedicated back end and are a powerful means of viewing centrally hosted resource datasets (for example, Xena’s predecessor, the University of California, Santa Cruz (UCSC) Cancer Browser (currently retired3), cBioPortal4, ICGC (International Cancer Genomics Consortium) Data Portal5, GDC Data Portal2). However, researchers wishing to use a data portal to explore their own data have to either redeploy the entire platform, a difficult task even for bioinformaticians, or upload private data to a server outside the user’s control, a non-starter for protected patient data, such as germline variants (for example, MAGI (Mutation Annotation and Genome Interpretation6), WebMeV7 or Ordino8). Desktop tools can view a user’s own data securely (for example, Integrated Genomics Viewer (IGV)9, Gitools10), but lack well-maintained, prebuilt files for the ever-evolving and expanding public data resources. This dichotomy between data portals and desktop tools highlights the challenge of using a single platform for both large public data and smaller-scale datasets generated by individual labs. Complicating this dichotomy is the expanding amount, and complexity, of cancer genomics data resulting from numerous technological advances, including lower-cost high-throughput sequencing and single-cell-based technologies. Cancer genomics datasets are now being generated using new assays, such as whole-genome sequencing11, DNA methylation whole-genome bisulfite sequencing12 and ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing13). Visualizing and exploring these diverse data modalities is important but challenging, especially as many tools have traditionally specialized in only one or perhaps a few data types. And although these complex datasets generate insights individually, integration with other omics datasets is crucial to help researchers discover and validate findings. UCSC Xena was developed as a high-performance visualization and analysis tool for both large public repositories and private datasets. It was built to scale with the current and future data growth and complexity. Xena’s privacy-aware architecture enables cancer researchers of all computational backgrounds to explore large, diverse datasets. Researchers use the same system to securely explore their own data, together or separately from the public data, all the while keeping private data secure. The system easily supports many tens of thousands of samples and has been tested with up to a million cells. The simple and flexible architecture supports a variety of common and uncommon data types. Xena’s Visual Spreadsheet visualization integrates gene-centric and genomic-coordinate-centric views across multiple data modalities, providing a deep, comprehensive view of genomic events within a cohort of tumors. UCSC Xena (http://xena.ucsc.edu) has two components: the front end Xena Browser and the back end Xena Hubs (Fig. 1). The web-based Xena Browser empowers biologists to explore data across multiple Xena Hubs with a variety of visualizations and analyses. The back end Xena Hubs host genomics data from laptops, public servers, behind a firewall, or in the cloud, and can be public or private (Supplementary Fig. 1). The Xena Browser receives data simultaneously from multiple Xena Hubs and integrates them into a single coherent visualization within the browser. A private Xena Hub is a hub installed on a user’s own computer (Supplementary Fig. 2). It is configured to only respond to requests from the computer’s localhost network interface (that is, http://127.0.0.1). This ensures that the hub only communicates with the computer on which the hub is installed. A public hub is configured to respond to requests from external computers. There are two types of public Xena Hubs (Supplementary Fig. 2). The first type is an open-public hub, which is a public hub accessible by everyone. While we host several open-public hubs (Supplementary Table 1), users can also set up their own as a way to share data. An example of one is the Treehouse Hub set up by the Childhood Cancer Initiative to share pediatric cancer RNA-seq gene expression data (Supplementary Note). The second type W eb s er ve r
[1]
Allison P. Heath,et al.
Toward a Shared Vision for Cancer Genomic Data.
,
2016,
The New England journal of medicine.
[2]
Mauro A. A. Castro,et al.
The chromatin accessibility landscape of primary human cancers
,
2018,
Science.
[3]
John Quackenbush,et al.
WebMeV: a Cloud Platform for Analyzing and Visualizing Cancer Genomic Data
,
2017,
bioRxiv.
[4]
Thomas Zichner,et al.
Ordino: a visual cancer analysis tool for ranking and exploring genes, cell lines and tissue samples
,
2019,
Bioinform..
[5]
Abhinav Nellore,et al.
Cloud computing for genomic data analysis and collaboration
,
2018,
Nature Reviews Genetics.
[6]
Mary Goldman,et al.
Toil enables reproducible, open source, big biomedical data analyses
,
2017,
Nature Biotechnology.
[7]
Helga Thorvaldsdóttir,et al.
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration
,
2012,
Briefings Bioinform..
[8]
Steven J. M. Jones,et al.
Pan-cancer analysis of whole genomes
,
2020,
Nature.
[9]
Ting Wang,et al.
The UCSC Cancer Genomics Browser
,
2009,
Nature Methods.
[10]
Nuria Lopez-Bigas,et al.
Gitools: Analysis and Visualisation of Genomic Data Using Interactive Heat-Maps
,
2011,
PloS one.
[11]
Arul M. Chinnaiyan,et al.
Cancer transcriptome profiling at the juncture of clinical translation
,
2017,
Nature Reviews Genetics.
[12]
Benjamin E. Gross,et al.
The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.
,
2012,
Cancer discovery.
[13]
Lincoln D Stein,et al.
The International Cancer Genome Consortium Data Portal
,
2019,
Nature Biotechnology.
[14]
Xin Zhou,et al.
Pan-cancer genome and transcriptome analyses of 1,699 pediatric leukemias and solid tumors
,
2018,
Nature.
[15]
Nicola J. Rinaldi,et al.
Genetic effects on gene expression across human tissues
,
2017,
Nature.
[16]
Peter W. Laird,et al.
Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer
,
2018,
Cell.
[17]
Hui Shen,et al.
DNA methylation loss in late-replicating domains is linked to mitotic cell division
,
2018,
Nature Genetics.
[18]
Benjamin J. Raphael,et al.
MAGI: visualization and collaborative annotation of genomic aberrations
,
2015,
Nature Methods.
[19]
L. Chin,et al.
Making sense of cancer genomic data.
,
2011,
Genes & development.
[20]
The Icgctcga Pan-Cancer Analysis of Whole Genomes Consortium.
Pan-cancer analysis of whole genomes
,
2020
.
[21]
Li Ding,et al.
Driver Fusions and Their Implications in the Development and Treatment of Human Cancers
,
2018,
Cell reports.