GECKO is a genetic algorithm to classify and explore high throughput sequencing data

Comparative analysis of high throughput sequencing data between multiple conditions often involves mapping of sequencing reads to a reference and downstream bioinformatics analyses. Both of these steps may introduce heavy bias and potential data loss. This is especially true in studies where patient transcriptomes or genomes may vary from their references, such as in cancer. Here we describe a novel approach and associated software that makes use of advances in genetic algorithms and feature selection to comprehensively explore massive volumes of sequencing data to classify and discover new sequences of interest without a mapping step and without intensive use of specialized bioinformatics pipelines. We demonstrate that our approach called GECKO for GEnetic Classification using k-mer Optimization is effective at classifying and extracting meaningful sequences from multiple types of sequencing approaches including mRNA, microRNA, and DNA methylome data.Aubin Thomas, Sylvain Barriere et al. present a computational method for classifying and extracting meaningful sequences from high-throughput sequencing data. The method, called GECKO, uses k-mer counts that are able to classify the input data with high accuracy.

[1]  Michael J. Ziller,et al.  Locally disordered methylation forms the basis of intratumor methylome variation in chronic lymphocytic leukemia. , 2014, Cancer cell.

[2]  A. Friedman,et al.  Resistance to Tyrosine Kinase Inhibition by Mutant Epidermal Growth Factor Receptor Variant III Contributes to the Neoplastic Phenotype of Glioblastoma Multiforme , 2004, Clinical Cancer Research.

[3]  Jun Zhang,et al.  Adaptive probabilities of crossover and mutation in genetic algorithms based on clustering technique , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[4]  Krishna R. Kalari,et al.  Tumor Sequencing and Patient-Derived Xenografts in the Neoadjuvant Treatment of Breast Cancer , 2017, Journal of the National Cancer Institute.

[5]  Obi L. Griffith,et al.  ORegAnno 3.0: a community-driven resource for curated regulatory annotation , 2015, Nucleic Acids Res..

[6]  Sheng Wang,et al.  Improved adaptive genetic algorithm with sparsity constraint applied to thermal neutron CT reconstruction of two-phase flow , 2018 .

[7]  S. Lonardi,et al.  CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers , 2015, BMC Genomics.

[8]  S. Miller,et al.  Potential targeting of B7‐H4 for the treatment of cancer , 2017, Immunological reviews.

[9]  C. Caldas,et al.  Stratification and therapeutic potential of PML in metastatic breast cancer , 2016, Nature Communications.

[10]  Inhibition of post-transcriptional steps in ribosome biogenesis confers cytoprotection against chemotherapeutic agents in a p53-dependent manner , 2017, Scientific Reports.

[11]  H. Reeves,et al.  Sulfatase-2: a prognostic biomarker and candidate therapeutic target in patients with pancreatic ductal adenocarcinoma , 2016, British Journal of Cancer.

[12]  Daniel Gautheret,et al.  DE-kupl: exhaustive capture of biological variation in RNA-seq data through k-mer decomposition , 2017, Genome Biology.

[13]  S. Rosen,et al.  Sulf-2: an extracellular modulator of cell signaling and a cancer target candidate , 2010, Expert opinion on therapeutic targets.

[14]  Qi-cong Luo,et al.  Pygo2 activates MDR1 expression and mediates chemoresistance in breast cancer via the Wnt/β-catenin pathway , 2016, Oncogene.

[15]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[16]  A. Frigessi,et al.  DNA methylation at enhancers identifies distinct breast cancer lineages , 2017, Nature Communications.

[17]  D. Jablons,et al.  SULF2 Expression Is a Potential Diagnostic and Prognostic Marker in Lung Cancer , 2016, PloS one.

[18]  J. Snowden,et al.  The role of JAK/STAT signalling in the pathogenesis, prognosis and treatment of solid tumours , 2015, British Journal of Cancer.

[19]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[20]  J. Espinosa,et al.  The NSL Chromatin-Modifying Complex Subunit KANSL2 Regulates Cancer Stem-like Properties in Glioblastoma That Contribute to Tumorigenesis. , 2016, Cancer research.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Kaoru Inoue,et al.  Functional classification of long non-coding RNAs by kmer content , 2018, Nature Genetics.

[23]  Ya D Sergeyev,et al.  On the efficiency of nature-inspired metaheuristics in expensive global optimization with limited budget , 2018, Scientific Reports.

[24]  Andreas Keller,et al.  A comprehensive, cell specific microRNA catalogue of human peripheral blood , 2017, Nucleic acids research.

[25]  Daniel N. Baker,et al.  KrakenUniq: confident and fast metagenomics classification using unique k-mer counts , 2018, Genome Biology.

[26]  M. Hariharan,et al.  A Novel Clinical Decision Support System Using Improved Adaptive Genetic Algorithm for the Assessment of Fetal Well-Being , 2015, Comput. Math. Methods Medicine.

[27]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[28]  KingsfordCarl,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011 .

[29]  S. Alahari,et al.  Breast Cancer Tumor Suppressors: A Special Emphasis on Novel Protein Nischarin. , 2015, Cancer research.

[30]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[31]  Michael R. Green,et al.  Loss of KLHL6 promotes diffuse large B-cell lymphoma growth and survival by stabilizing the mRNA decay factor Roquin2 , 2018, Nature Cell Biology.

[32]  Matthew T. Maurano,et al.  Widespread plasticity in CTCF occupancy linked to DNA methylation , 2012, Genome research.

[33]  Luis González Abril,et al.  Ameva: An autonomous discretization algorithm , 2009, Expert Syst. Appl..

[34]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[35]  A. Heijink,et al.  TPX2/Aurora kinase A signaling as a potential therapeutic target in genomically unstable cancer cells , 2018, Oncogene.

[36]  Yu. P. Simonov,et al.  Stability of the PHF10 subunit of PBAF signature module is regulated by phosphorylation: role of β-TrCP , 2017, Scientific Reports.