Algorithms meet sequencing technologies – 10th edition of the RECOMB-Seq workshop

Summary DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.

[1]  Barbara J. Wold,et al.  A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification , 2019, bioRxiv.

[2]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[3]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[4]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[5]  Antti Honkela,et al.  Identifying differentially expressed transcripts from RNA-seq data with biological variation , 2011, Bioinform..

[6]  Rob Patro,et al.  Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms , 2013, Nature Biotechnology.

[7]  James E. DiCarlo,et al.  RNA-Guided Human Genome Engineering via Cas9 , 2013, Science.

[8]  I. Sitkiewicz,et al.  A crash course in sequencing for a microbiologist , 2019, Journal of Applied Genetics.

[9]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[10]  Lucia Williams,et al.  Maximal Perfect Haplotype Blocks with Wildcards , 2020, iScience.

[11]  Alexandru I. Tomescu,et al.  Linear Time Construction of Indexable Founder Block Graphs , 2020, WABI.

[12]  L. Pachter,et al.  Streaming fragment assignment for real-time analysis of sequencing experiments , 2012, Nature Methods.

[13]  Paul Medvedev,et al.  Scalable Pairwise Whole-Genome Homology Mapping of Long Genomes with BubbZ , 2020, iScience.

[14]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[15]  Bonnie Berger,et al.  Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape , 2019, bioRxiv.

[16]  Angela N. Brooks,et al.  Nanopore native RNA sequencing of a human poly(A) transcriptome , 2018, bioRxiv.

[17]  Faraz Hach,et al.  HASLR: Fast Hybrid Assembly of Long Reads , 2020, bioRxiv.

[18]  Rob Knight,et al.  The Earth Microbiome project: successes and aspirations , 2014, BMC Biology.

[19]  Krishna R. Kalari,et al.  Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data , 2019, Bioinform..

[20]  R. Irizarry,et al.  Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation , 2015, Nature Biotechnology.

[21]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[22]  J. Doudna,et al.  The new frontier of genome engineering with CRISPR-Cas9 , 2014, Science.

[23]  Lior Pachter,et al.  A curated database reveals trends in single-cell transcriptomics , 2019, bioRxiv.

[24]  Ritesh Krishna,et al.  Hierarchically Labeled Database Indexing Allows Scalable Characterization of Microbiomes , 2020, iScience.

[25]  Chirag Jain,et al.  Weighted minimizer sampling improves long read mapping , 2020, bioRxiv.

[26]  Fabian J. Theis,et al.  EpiScanpy: integrated single-cell epigenomic analysis , 2019, bioRxiv.

[27]  Z. Iqbal,et al.  Nucleotide-resolution bacterial pan-genomics with reference graphs , 2020, bioRxiv.

[28]  Alexandru I. Tomescu,et al.  Graphs Cannot Be Indexed in Polynomial Time for Sub-quadratic Time String Matching, Unless SETH Fails , 2020, SOFSEM.

[29]  Srinivas Aluru,et al.  A comprehensive evaluation of long read error correction methods , 2019, BMC Genomics.

[30]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[31]  L. Coin,et al.  Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads , 2011, Genome Biology.

[32]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[33]  Khaled M. Elbassioni,et al.  Sphetcher: Spherical Thresholding Improves Sketching of Single-Cell Transcriptomic Heterogeneity , 2020, iScience.

[34]  Matthew A. Hibbs,et al.  RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations , 2014, Genetics.

[35]  Laura H. Tung,et al.  Quantifying the benefit offered by transcript assembly with Scallop-LR on single-molecule long reads , 2019, Genome Biology.

[36]  Dany Severac,et al.  TALC: Transcript-level Aware Long Read Correction , 2020, bioRxiv.

[37]  Luca Pinello,et al.  CRISPRitz: rapid, high-throughput and variant-aware in silico off-target site identification for CRISPR genome editing , 2019, Bioinform..

[38]  Gregory Kucherov,et al.  Evolution of biosequence search algorithms: a brief survey , 2018, Bioinform..

[39]  Le Cong,et al.  Multiplex Genome Engineering Using CRISPR/Cas Systems , 2013, Science.

[40]  Benedict Paten,et al.  Haplotype-aware graph indexes , 2018, bioRxiv.

[41]  Rob Patro,et al.  Salmon provides fast and bias-aware quantification of transcript expression , 2017, Nature Methods.

[42]  Colin N. Dewey,et al.  RNA-Seq gene expression estimation with read mapping uncertainty , 2009, Bioinform..

[43]  Hannah A. Pliner,et al.  A human cell atlas of fetal gene expression , 2020, Science.

[44]  R. Green,et al.  New Approaches for Genome Assembly and Scaffolding. , 2019, Annual review of animal biosciences.

[45]  R. Shamir,et al.  SCAPP: an algorithm for improved plasmid assembly in metagenomes , 2020, Microbiome.

[46]  Walter L. Ruzzo,et al.  A new approach to bias correction in RNA-Seq , 2012, Bioinform..

[47]  Geo Pertea,et al.  Transcriptome assembly from long-read RNA-seq alignments with StringTie2 , 2019, Genome Biology.

[48]  Cole Trapnell,et al.  Improving RNA-Seq expression estimates by correcting for fragment bias , 2011, Genome Biology.

[49]  Michael C. Thompson,et al.  BATMAN: Fast and Accurate Integration of Single-Cell RNA-Seq Datasets via Minimum-Weight Matching , 2020, bioRxiv.

[50]  Yifan Zhang,et al.  CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning , 2020, iScience.

[51]  Eleazar Eskin,et al.  Metalign: efficient alignment-based metagenomic profiling via containment min hash , 2020, Genome Biology.

[52]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[53]  Fatemeh Almodaresi,et al.  A space and time-efficient index for the compacted colored de Bruijn graph , 2017, bioRxiv.