A Next-generation Sequence Clustering Method for E. Coli through Proteomics-genomics Data Mapping

Abstract Recent publications of various ‘omics’ data have provided new challenges and opportunities to the development of novel approaches to the assembly of next-generation sequences. As an attempt to improve the quality of assembled sequences, we developed a next-generation sequence clustering method by using the interdependency between genomics and proteomics data, which has not been well utilized so far in this field. Given a set of next-generation read sequences with a number of protein sequences, our method clusters the read sequences by mapping to the protein sequences. As a preliminary research, we selected Escherichia coli ( E. coli ) as our target species and simulated next-generation reads of E. coli to evaluate our method by analyzing the actual adjacency of the clustered reads in the E. coli genome. We found that ( i ) read base matching (RBM) ratio, which represents the amount of bases in a read that are mapped to a protein sequence, higher than 50∼70% is a useful criterion for effective read clustering and ( ii ) higher RBM ratio does not always lead to better quality of clusters in the case of E. coli . These preliminary results demonstrate that the integrative approach is simple yet has great potential for clustering adjacent reads in a genome.

[1]  Jacob D. Jaffe,et al.  The complete genome and proteome of Mycoplasma mobile. , 2004, Genome research.

[2]  Zhong Wang,et al.  Next-generation transcriptome assembly , 2011, Nature Reviews Genetics.

[3]  J. Gilbert,et al.  Metagenomics - a guide from sampling to data analysis , 2012, Microbial Informatics and Experimentation.

[4]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[5]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[6]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.

[7]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[8]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[9]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[10]  Sean D. Hooper,et al.  Annotation of metagenome short reads using proxygenes , 2008, ECCB.

[11]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.