A space efficient representation for sparse de Bruijn subgraphs

De Bruijn graphs are structures that appear naturally in the study of strings. Therefore the rise of de Bruijn graph based sequence analysis approaches is not a surprise. The problem with de Bruijn graphs is that for most of their applications in Bioinformatics they are too large even for small genomes. A way to overcome this problem is the compression of branch-free paths to single nodes. Although this compression is a common first step in many of the de Bruijn graph based approaches, its direct construction from raw data does not seem to be documented before. Our experience shows that, though based on simple operations, implementing the construction of such graphs is a tricky and time consuming task. Therefore we shortly describe in this report our graph construction algorithm and hope that the given details are enough to help the reader skipping some pitfalls we found by doing this task.

[1]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[2]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  P. Pevzner,et al.  De Novo Repeat Classification and Fragment Assembly , 2004 .

[4]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[5]  Michael S. Waterman,et al.  A New Algorithm for DNA Sequence Assembly , 1995, J. Comput. Biol..

[6]  Shahid H. Bokhari,et al.  A parallel graph decomposition algorithm for DNA sequencing with nanopores , 2005, Bioinform..

[7]  Haixu Tang,et al.  Fragment assembly with short reads , 2004, Bioinform..

[8]  Benjamin J. Raphael,et al.  A novel method for multiple alignment of sequences with repeated and shuffled elements. , 2004, Genome research.

[9]  Ferdinando Cicalese,et al.  2-Stage Fault Tolerant Interval Group Testing , 2007, ISAAC.

[10]  Eugene W. Myers,et al.  The fragment assembly string graph , 2005, ECCB/JBI.

[11]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[12]  Eduardo Moreno Graphes et cycles de de Bruijn dans des langages avec des restrictions. (De Bruijn graphs and sequences in languages with restrictions) , 2005 .

[13]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[14]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[15]  Zsuzsanna Lipták,et al.  Decomposing Metabolomic Isotope Patterns , 2006, WABI.

[16]  Gerhard Sagerer,et al.  Side chain flexibility for 1:n protein-protein docking , 2002 .