Sequencing by hybridization in the presence of hybridization errors.

DNA sequencing is a very important problem in genomics. Several different sequencing methods are currently utilized. One promising method uses a sequencing chip to obtain information about the presence of subsequences in DNA. This paper deals with sequencing of hybridization data from a sequencing chip, called Sequencing by Hybridization (SBH). Preparata et al. proposed a new sequencing chip using universal bases, together with a new sequencing algorithm, and showed that its performance is significantly better than the standard scheme based on oligomer probes. However, the presence of errors in the sequencing chip was not considered, and the method of Preparata et al. cannot be used directly in practice. This paper proposes sequencing algorithms in the presence of hybridization errors for their sequencing chip and applies these algorithms to random data in the presence of random errors. Computational results show that false negative errors have larger effects on the rates of correct reconstruction than do false positive errors. Our extended sequencing algorithms are useful when there are a few hybridization errors.

[1]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[2]  Eli Upfal,et al.  Sequencing-by-Hybridization at the Information-Theory Bound: An Optimal Algorithm , 2000, J. Comput. Biol..

[3]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[4]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[5]  Pavel A. Pevzner,et al.  Towards DNA Sequencing Chips , 1994, MFCS.

[6]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[7]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[8]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[9]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[10]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[11]  R. Lipshutz,et al.  Likelihood DNA sequencing by hybridization. , 1993, Journal of biomolecular structure & dynamics.

[12]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[13]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[14]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[15]  Matthew Richardson,et al.  Mining the network value of customers , 2001, KDD '01.

[16]  Charles R. Cantor,et al.  Genomics: The Science and Technology Behind the Human Genome Project , 1999 .

[17]  Alan M. Frieze,et al.  On the power of universal bases in sequencing by hybridization , 1999, RECOMB.