Maximum likelihood principle for DNA copy number analysis

Microarray technologies had been used to measure DNA copy number data. The copy number represents the relative fluorescent intensity level between control and test DNA samples. Variation of this number may lead to many genetic diseases such as cancer. Unfortunately, the observed copy numbers are corrupted by noise due to experimental errors and probes accuracy, making the variations hard to detect. Different techniques had been proposed to denoise the data and to extract the important feature such as the breakpoints from the variant regions. In this paper, we present a robust procedure for the analysis of DNA copy number data based on maximum likelihood principle using global information of the entire data record. We show that Dynamic programming can be used to compute the DNA copy number estimates and reduce the computational complexity. Furthermore, we employ the Minimum Description Length rule to estimate the number of unknown parameters. Using simulated and real data, we show that the proposed method outperforms other popular commercial software and published algorithms.

[1]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[2]  L. Chin,et al.  High-resolution characterization of the pancreatic adenocarcinoma genome , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[4]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[5]  A.H. Tewfik,et al.  DNA Copy Number Detection and Sigma Filter , 2007, 2007 IEEE International Workshop on Genomic Signal Processing and Statistics.

[6]  Yuhang Wang,et al.  A novel stationary wavelet denoising algorithm for array-based DNA Copy Number data , 2007, Int. J. Bioinform. Res. Appl..

[7]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[8]  S.M. Kay,et al.  Digital signal processing for sonar , 1981, Proceedings of the IEEE.

[9]  Ahmed H. Tewfik,et al.  Framework for the analysis of genetic variations across multiple DNA copy number samples , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[11]  Robert E. Larson,et al.  Principles of Dynamic Programming , 1978 .

[12]  Antonio Ortega,et al.  Wavelet Footprints and Sparse Bayesian Learning for DNA Copy Number Change Analysis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  S. Selleck,et al.  Recurrent 10q22-q23 deletions: a genomic disorder on 10q associated with cognitive and behavioral abnormalities. , 2007, American journal of human genetics.