A novel approach to detect copy number variation using segmentation and genetic algorithm

Among many forms of genomic variations, copy-number variations (CNVs) can be defined as gains or losses of several kilobases to hundreds of kilobases of genomic DNA. Since many CNVs include genes that result in differential levels of gene expression, CNVs may account for a significant proportion of normal phenotypic variation. Some scientists demonstrated that a large portion of overlapping, currently known common human CNVs, were smaller in his dataset. However, previous experimental studies, performed primarily by a-CGH techniques, are limited to detection of CNVs of large-sized CNVs. Efficient algorithms for finding small-sized CNVs are essential. In our paper, we propose a novel approach to find small-sized CNVs on a-CGH data which is a sequential 2-dimensional clustering method. The algorithm we propose is robust to some level of noise. And regardless of the size of probes, our algorithm can find CNVs consisting of small number of probes.

[1]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[2]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[3]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[4]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[5]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[6]  R. Redon,et al.  Copy Number Variation: New Insights in Genome Diversity References , 2006 .

[7]  G. Johnson,et al.  Nature Encyclopedia of the Human Genome , 2004 .

[8]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[9]  Åsa Hedman,et al.  SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data , 2005, Nucleic acids research.

[10]  Federico Divina,et al.  Biclustering of expression data with evolutionary computation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  A. Tsalenko,et al.  The fine-scale and complex architecture of human copy-number variation. , 2008, American journal of human genetics.

[12]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[13]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[14]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[15]  David Neil Cooper,et al.  Nature encyclopedia of the human genome , 2003 .

[16]  Franck Picard,et al.  A statistical approach for array CGH data analysis , 2005, BMC Bioinformatics.

[17]  Nigel P. Carter,et al.  Accurate and reliable high-throughput detection of copy number variation in the human genome. , 2006, Genome research.

[18]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..