Genetic variation detection using maximum likelihood estimator

In recent years it has come to be appreciated that submicroscopic DNA copy number differences represent an important source of human genetic variation and contribute significantly to disease susceptibility. Array comparative genomic hybridization has emerged as a powerful tool for assessing copy number change and a number of algorithms have been developed to accurately assign copy number segments while minimizing errors from this inherently variable methodology. In this paper, we present an extended version of our previously proposed algorithm, maximum likelihood estimator, to clearly map and detect copy number variations. The extension accounts for both the unequal spacing distance between the contiguous probes and the regional evaluation of the detected segments based on biological information of the genomic positions. Using genomic DNA from well-characterized cell lines, we compare the performance of the proposed methods. Finally, the experimental results show that our proposed method outperforms other popular commercial programs and published algorithms.

[1]  Antonio Ortega,et al.  Wavelet Footprints and Sparse Bayesian Learning for DNA Copy Number Change Analysis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[3]  Robert E. Larson,et al.  Principles of Dynamic Programming , 1978 .

[4]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[5]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[6]  Ahmed H. Tewfik,et al.  Maximum likelihood principle for DNA copy number analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Yuhang Wang,et al.  A novel stationary wavelet denoising algorithm for array-based DNA Copy Number data , 2007, Int. J. Bioinform. Res. Appl..

[8]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[9]  S. Selleck,et al.  Recurrent 10q22-q23 deletions: a genomic disorder on 10q associated with cognitive and behavioral abnormalities. , 2007, American journal of human genetics.

[10]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[11]  L. Chin,et al.  High-resolution characterization of the pancreatic adenocarcinoma genome , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Ahmed H. Tewfik,et al.  Framework for the analysis of genetic variations across multiple DNA copy number samples , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Yonatan Aumann,et al.  Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis , 2005, RECOMB.

[14]  Alexander Eckehart Urban,et al.  High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. , 2006, Proceedings of the National Academy of Sciences of the United States of America.