Caution regarding the specificities of pan-cancer microbial structure

The results published in Poore and Kopylova et al. 2020[1] revealed the possibility of being able to almost perfectly differentiate between types of tumour based on their microbial composition using machine learning models. Whilst we believe that there is the potential for microbial composition to be used in this manner, we have concerns with the manuscript that make us question the certainty of the conclusions drawn. We believe there are issues in the areas of the contribution of contamination, handling of batch effects, false positive classifications and limitations in the machine learning approaches used. This makes it difficult to identify whether the authors have identified true biological signal and how robust these models would be in use as clinical biomarkers. We commend Poore and Kopylova et al. on their approach to open data and reproducibility that has enabled this analysis. We hope that this discourse assists the future development of machine learning models and hypothesis generation in microbiome research.

[1]  William Stafford Noble,et al.  Navigating the pitfalls of applying machine learning in genomics , 2021, Nature Reviews Genetics.

[2]  Donovan H. Parks,et al.  GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy , 2021, Nucleic Acids Res..

[3]  F. Hildebrand,et al.  Much ado about nothing? Off-target amplification can lead to false-positive bacterial brain microbiome detection in healthy and Parkinson’s disease individuals , 2021, Microbiome.

[4]  D. Charnock-Jones,et al.  Batch effects account for the main findings of an in utero human intestinal bacterial colonization study , 2021, Microbiome.

[5]  Anders B. Dohlman,et al.  The cancer microbiome atlas: a pan-cancer comparative analysis to distinguish tissue-resident microbiota from contaminants. , 2020, Cell host & microbe.

[6]  Trevor C. Charles,et al.  Correction to: Microbiome definition re-visited: old concepts and new challenges , 2020, Microbiome.

[7]  Rob Knight,et al.  Microbiome analyses of blood and tissues suggest cancer diagnostic approach , 2020, Nature.

[8]  S. Short,et al.  Diversity of Viruses Infecting Eukaryotic Algae. , 2020, Current issues in molecular biology.

[9]  Chenyan Zhou,et al.  Leucothrix sargassi sp. nov., isolated from a marine alga [Sargassum natans (L.) Gaillon]. , 2019, International journal of systematic and evolutionary microbiology.

[10]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[11]  C. Cooper,et al.  SEPATH: benchmarking the search for pathogens in human tissue whole genome sequence data leads to template pipelines , 2019, Genome Biology.

[12]  D. Charnock-Jones,et al.  Author Correction: Human placenta has no microbiome but can contain potential pathogens , 2019, Nature.

[13]  B. Larsen Faculty Opinions recommendation of Human placenta has no microbiome but can contain potential pathogens. , 2019, Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature.

[14]  Florian P Breitwieser,et al.  A review of methods and databases for metagenomic classification and assembly , 2019, Briefings Bioinform..

[15]  Rob Knight,et al.  Contamination in Low Microbial Biomass Microbiome Studies: Issues and Recommendations. , 2019, Trends in microbiology.

[16]  R. Eils,et al.  The landscape of viral associations in human cancers , 2018, bioRxiv.

[17]  D. Charnock-Jones,et al.  Recognizing the reagent microbiome , 2018, Nature Microbiology.

[18]  Luke R. Thompson,et al.  Best practices for analysing microbiomes , 2018, Nature Reviews Microbiology.

[19]  J. McKeating,et al.  Viral hepatitis and liver cancer , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[20]  C. Wilke Streamlined Plot Theme and Plot Annotations for 'ggplot2' , 2015 .

[21]  John H. E. Nash,et al.  Taxonomic reassessment of N4-like viruses using comparative genomics and proteomics suggests a new subfamily - “Enquartavirinae” , 2015, Archives of Virology.

[22]  Qingfa Wu,et al.  Complete genome sequence of a novel velarivirus infecting areca palm in China , 2015, Archives of Virology.

[23]  Georg K Gerber,et al.  The dynamic microbiome , 2014, FEBS letters.

[24]  Paul Turner,et al.  Reagent and laboratory contamination can critically impact sequence-based microbiome analyses , 2014, BMC Biology.

[25]  M. Vaneechoutte,et al.  Characterization of Newly Isolated Lytic Bacteriophages Active against Acinetobacter baumannii , 2014, PloS one.

[26]  J. Chiorini,et al.  The family Parvoviridae , 2014, Archives of Virology.

[27]  Steven P. Millard,et al.  EnvStats: An R Package for Environmental Statistics , 2013 .

[28]  M. Vancanneyt,et al.  Salinimicrobium marinum sp. nov., a halophilic bacterium of the family Flavobacteriaceae, and emended descriptions of the genus Salinimicrobium and Salinimicrobium catena. , 2010, International journal of systematic and evolutionary microbiology.

[29]  R. Knight,et al.  Bacterial Community Variation in Human Body Habitats Across Space and Time , 2009, Science.

[30]  H. Kasai,et al.  Thalassomonas actiniarum sp. nov. and Thalassomonas haliotis sp. nov., isolated from marine animals. , 2009, International journal of systematic and evolutionary microbiology.

[31]  Harald Huber,et al.  Ignicoccus hospitalis sp. nov., the host of 'Nanoarchaeum equitans'. , 2007, International journal of systematic and evolutionary microbiology.

[32]  F. Rabenstein,et al.  An improved polyclonal antiserum for detecting Ryegrass mosaic rymovirus , 2005, Archives of Virology.

[33]  D. Stenger,et al.  Phylogenetic relationships, strain diversity and biogeography of tritimoviruses. , 2002, The Journal of general virology.

[34]  Paul J. McMurdie,et al.  Normalization of Microbiome Profiling Data. , 2018, Methods in molecular biology.

[35]  S. Weaver,et al.  Insect-Specific Viruses: A Historical Overview and Recent Developments. , 2017, Advances in virus research.