Prediction of the optimum combination of solexa sequencing libraries in genome projects

DNA sequencing technology has played an important role on life sciences, especially Illumina's solexa sequencer. It was used for more and more genome projects. Solexa libraries were usually constructed with insert sizes of 200bp, 500bp, 2k, 5k and 10k in genome projects. It is a problem how to find the optimum combination of different insert sizes and different depth of solexa sequencing libraries. In this paper, we took the wild rice genome sequencing project for example. One tool SRSD was explored to simulate random solexa libraries based on cultivated rice genome sequence. Different depth of 200bp, 500bp, 2k, 5k and 10k solexa libraries were produced by the tool. After assembling and calculating their contig N50 and scaffold N50, the optimum combination of solexa libraries was predicted. It mainly includes 24X-depth 500bp, 6X-depth 2k, 4X-depth 5k and 4X-depth 10k libraries. These sequences would assemble 320Mbp rice genome with contig N50 7.8k and scaffold N50 185.3k by SOAPdenovo. And the result suggests 500bp library is more useful than 200bp library for sequence assembly. It provides effective guide for genome projects by solexa sequencer. And it would be able to greatly reduce cost and improve the quality of genome assembly.