Cyberbiosecurity Challenges of Pathogen Genome Databases

Pathogen detection, identification, and tracking is shifting from non-molecular methods, DNA fingerprinting methods, and single gene methods to methods relying on whole genomes. Viral Ebola and influenza genome data are being used for real-time tracking, while food-borne bacterial pathogen outbreaks and hospital outbreaks are investigated using whole genomes in the UK, Canada, the USA and the other countries. Also, plant pathogen genomes are starting to be used to investigate plant disease epidemics such as the wheat blast outbreak in Bangladesh. While these genome-based approaches provide never-seen advantages over all previous approaches with regard to public health and biosecurity, they also come with new vulnerabilities and risks with regard to cybersecurity. The more we rely on genome databases, the more likely these databases will become targets for cyber-attacks to interfere with public health and biosecurity systems by compromising their integrity, taking them hostage, or manipulating the data they contain. Also, while there is the potential to collect pathogen genomic data from infected individuals or agricultural and food products during disease outbreaks to improve disease modeling and forecast, how to protect the privacy of individuals, growers, and retailers is another major cyberbiosecurity challenge. As data become linkable to other data sources, individuals and groups become identifiable and potential malicious activities targeting those identified become feasible. Here, we define a number of potential cybersecurity weaknesses in today's pathogen genome databases to raise awareness, and we provide potential solutions to strengthen cyberbiosecurity during the development of the next generation of pathogen genome databases.

[1]  Lin Chen,et al.  DL-BAC: Distributed Ledger Based Access Control for Web Applications , 2017, WWW.

[2]  Dennis A. Benson,et al.  GenBank , 2012, Nucleic Acids Res..

[3]  Rida Assaf,et al.  Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center , 2016, Nucleic Acids Res..

[4]  Dennis A. Benson,et al.  GenBank , 2017, Nucleic Acids Res..

[5]  Eran Segal,et al.  Taking it Personally: Personalized Utilization of the Human Microbiome in Health and Disease. , 2016, Cell host & microbe.

[6]  Ying Cheng,et al.  The European Nucleotide Archive , 2010, Nucleic Acids Res..

[7]  Ravi S. Sandhu The typed access matrix model , 1992, Proceedings 1992 IEEE Computer Society Symposium on Research in Security and Privacy.

[8]  Atul Prakash,et al.  Ensuring Authorized Updates in Multi-user Database-Backed Applications , 2017, USENIX Security Symposium.

[9]  Xing-Ming Zhao,et al.  Victors: a web-based knowledge base of virulence factors in human and animal pathogens , 2018, Nucleic Acids Res..

[10]  Scotland Leman,et al.  PAMDB, a multilocus sequence typing and analysis database and website for plant-associated microbes. , 2010, Phytopathology.

[11]  Kim Rutherford,et al.  PHI-base: a new interface and further additions for the multi-species pathogen–host interactions database , 2016, Nucleic Acids Res..

[12]  A. Trkola,et al.  Metagenomic sequencing complements routine diagnostics in identifying viral pathogens in lung transplant recipients with unknown etiology of respiratory infection , 2017, PloS one.

[13]  J. Erb-Downward,et al.  Rapid Pathogen Identification in Bacterial Pneumonia Using Real-Time Metagenomics. , 2017, American journal of respiratory and critical care medicine.

[14]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[15]  Jean Peccoud,et al.  Cyberbiosecurity: An Emerging New Discipline to Help Safeguard the Bioeconomy , 2018, Front. Bioeng. Biotechnol..

[16]  Tanya Barrett,et al.  The Gene Expression Omnibus Database , 2016, Statistical Genomics.

[17]  Katherine H. Huang,et al.  Identifying personal microbiomes using metagenomic codes , 2015, Proceedings of the National Academy of Sciences.

[18]  Julian Jang,et al.  A survey of emerging threats in cybersecurity , 2014, J. Comput. Syst. Sci..

[19]  Alvaro A. Cárdenas,et al.  Attacks against process control systems: risk assessment, detection, and response , 2011, ASIACCS '11.

[20]  Cheryl L. Tarr,et al.  Metagenomics of Two Severe Foodborne Outbreaks Provides Diagnostic Signatures and Signs of Coinfection Not Attainable by Traditional Methods , 2016, Applied and Environmental Microbiology.

[21]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[22]  Norbik Bashah Idris,et al.  A Survey on Querying Encrypted Data for Database as a Service , 2013, 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[23]  Keith R. Jerome,et al.  Rapid Metagenomic Next-Generation Sequencing during an Investigation of Hospital-Acquired Human Parainfluenza Virus 3 Infections , 2016, Journal of Clinical Microbiology.

[24]  Ju-Hoon Lee,et al.  Metagenomic Approach to Identifying Foodborne Pathogens on Chinese Cabbage. , 2018, Journal of microbiology and biotechnology.

[25]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[26]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[27]  R. Hughes Synthetic DNA , 2018, Methods in Molecular Biology.

[28]  Willi Meier,et al.  Fast correlation attacks on certain stream ciphers , 1989, Journal of Cryptology.

[29]  R. Scheuermann,et al.  Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research Community , 2012, Viruses.

[30]  Sheng Zhong,et al.  Privacy-enhancing k-anonymization of customer data , 2005, PODS.

[31]  Amy L McGuire,et al.  Confidentiality, privacy, and security of genetic and genomic test information in electronic health records: points to consider , 2008, Genetics in Medicine.

[32]  C. Kruse,et al.  Cybersecurity in healthcare: A systematic review of modern threats and trends. , 2017, Technology and health care : official journal of the European Society for Engineering and Medicine.

[33]  Frédéric Cuppens,et al.  Organization based access control , 2003, Proceedings POLICY 2003. IEEE 4th International Workshop on Policies for Distributed Systems and Networks.

[34]  Tatiana A. Tatusova,et al.  BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata , 2011, Nucleic Acids Res..

[35]  Rong Zheng,et al.  Detecting Stealthy False Data Injection Using Machine Learning in Smart Grid , 2017, IEEE Systems Journal.

[36]  Marc W. Allard Commentary: The Future of Whole-Genome Sequencing for Public Health and the Clinic , 2016, Journal of Clinical Microbiology.

[37]  Haiming Wang,et al.  EuPathDB: the eukaryotic pathogen genomics database resource , 2016, Nucleic Acids Res..

[38]  Audun Jøsang,et al.  A survey of trust and reputation systems for online service provision , 2007, Decis. Support Syst..

[39]  Y. He,et al.  PHIDIAS: a pathogen-host interaction data integration and analysis system , 2007, Genome Biology.

[40]  Seeking Security , 2020, The <I>'Ulama</I> in Contemporary Pakistan.

[41]  Jean Peccoud,et al.  Cyberbiosecurity: From Naive Trust to Risk Awareness. , 2018, Trends in biotechnology.

[42]  I-Min A. Chen,et al.  IMG/M: integrated genome and metagenome comparative data analysis system , 2016, Nucleic Acids Res..

[43]  J. Schrenzel,et al.  When Bacterial Culture Fails, Metagenomics Can Help: A Case of Chronic Hepatic Brucelloma Assessed by Next-Generation Sequencing , 2018, Front. Microbiol..

[44]  Radu Sion,et al.  TrustedDB: A Trusted Hardware-Based Database with Privacy and Data Confidentiality , 2014, IEEE Trans. Knowl. Data Eng..

[45]  Sarah C. Ayling,et al.  The Ensembl gene annotation system , 2016, Database J. Biol. Databases Curation.

[46]  A. Dombrovsky,et al.  Diagnosis of plant diseases using the Nanopore sequencing platform , 2018, Plant Pathology.

[47]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[48]  Nuno A. Fonseca,et al.  Expression Atlas: gene and protein expression across multiple studies and organisms , 2017, Nucleic Acids Res..

[49]  Deanna M. Church,et al.  Assembly: a resource for assembled genomes at NCBI , 2015, Nucleic Acids Res..

[50]  Vitaly Shmatikov,et al.  EARP: Principled Storage, Sharing, and Protection for Mobile Apps , 2016, GETMBL.

[51]  Elisa Bertino,et al.  Supporting multiple access control policies in database systems , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[52]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[53]  Christopher J. Rawlings,et al.  PHI-base: a new database for pathogen host interactions , 2005, Nucleic Acids Res..

[54]  Vitaly Shmatikov,et al.  Robust De-anonymization of Large Sparse Datasets , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[55]  Emanuele Garone,et al.  False data injection attacks against state estimation in wireless sensor networks , 2010, 49th IEEE Conference on Decision and Control (CDC).

[56]  Nikos Kyrpides,et al.  Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements , 2016, Nucleic Acids Res..

[57]  Uma Maheswari,et al.  PhytoPath: an integrative resource for plant pathogen genomics , 2015, Nucleic Acids Res..

[58]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[59]  Nikita Borisov,et al.  Mining on Someone Else's Dime: Mitigating Covert Mining Operations in Clouds and Enterprises , 2017, RAID.

[60]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[61]  Zuotao Zhao,et al.  High diversity of airborne fungi in the hospital environment as revealed by meta-sequencing-based microbiome analysis , 2017, Scientific Reports.

[62]  Vladimir Kolesnikov,et al.  A Pragmatic Introduction to Secure Multi-Party Computation , 2019, Found. Trends Priv. Secur..

[63]  Inna Dubchak,et al.  MycoCosm portal: gearing up for 1000 fungal genomes , 2013, Nucleic Acids Res..

[64]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..