Conditional random pattern model for copy number aberration detection

BackgroundDNA copy number aberration (CNA) is very important in the pathogenesis of tumors and other diseases. For example, CNAs may result in suppression of anti-oncogenes and activation of oncogenes, which would cause certain types of cancers. High density single nucleotide polymorphism (SNP) array data is widely used for the CNA detection. However, it is nontrivial to detect the CNA automatically because the signals obtained from high density SNP arrays often have low signal-to-noise ratio (SNR), which might be caused by whole genome amplification, mixtures of normal and tumor cells, experimental noise or other technical limitations. With the reduction in SNR, many false CNA regions are often detected and the true CNA regions are missed. Thus, more sophisticated statistical models are needed to make the CNAs detection, using the low SNR signals, more robust and reliable.ResultsThis paper presents a conditional random pattern (CRP) model for CNA detection where much contextual cues are explored to suppress the noise and improve CNA detection accuracy. Both simulated and the real data are used to evaluate the proposed model, and the validation results show that the CRP model is more robust and reliable in the presence of noise for CNA detection using high density SNP array data, compared to a number of widely used software packages.ConclusionsThe proposed conditional random pattern (CRP) model could effectively detect the CNA regions in the presence of noise.

[1]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. Delforge Understanding the pathogenesis of myelodysplastic syndromes. , 2003, The hematology journal : the official journal of the European Haematology Association.

[3]  C. Chang,et al.  Multiple distinct clones may co-exist in different lineages in myelodysplastic syndromes. , 2009, Leukemia research.

[4]  Hiroyuki Aburatani,et al.  Allelic dosage analysis with genotyping microarrays. , 2005, Biochemical and biophysical research communications.

[5]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[6]  U. Germing,et al.  Validation of the WHO proposals for a new classification of primary myelodysplastic syndromes: a retrospective analysis of 1600 patients. , 2000, Leukemia research.

[7]  R. Tibshirani,et al.  A method for calling gains and losses in array CGH data. , 2005, Biostatistics.

[8]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[9]  Charles Lee,et al.  Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays. , 2006, Genome research.

[10]  C. Yau,et al.  QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data , 2007, Nucleic acids research.

[11]  E. S. Venkatraman,et al.  A faster circular binary segmentation algorithm for the analysis of array CGH data , 2007, Bioinform..

[12]  W. Kuo,et al.  Quantitative mapping of amplicon structure by array CGH identifies CYP24 as a candidate oncogene , 2000, Nature Genetics.

[13]  P. Nguyen,et al.  Myelodysplastic syndromes , 2009, Nature Reviews Disease Primers.

[14]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[15]  Fred A. Wright,et al.  Integrated study of copy number states and genotype calls using high-density SNP arrays , 2009, Nucleic acids research.

[16]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[17]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[18]  D. Schaid Mathematical and Statistical Methods for Genetic Analysis , 1999 .

[19]  Jane Fridlyand,et al.  Erratum: "Hidden Markov models approach to the analysis of array CGH data" (Journal of Multivariate Analysis (2004) vol. 90 (132-153) 10.1016/j.jmva.2004.02.008) , 2005 .

[20]  Emmanuel Barillot,et al.  Analysis of array CGH data: from signal ratio to gain and loss of DNA regions , 2004, Bioinform..

[21]  Åsa Hedman,et al.  SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data , 2005, Nucleic acids research.

[22]  T. LaFramboise,et al.  SNP arrays in heterogeneous tissue: highly accurate collection of both germline and somatic genetic information from unpaired single tumor samples. , 2008, American journal of human genetics.

[23]  Sarah Barber,et al.  Oligonucleotide microarray analysis of genomic imbalance in children with mental retardation. , 2006, American journal of human genetics.

[24]  Shigeru Chiba,et al.  A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. , 2005, Cancer research.

[25]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[26]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[27]  Peter J. Park,et al.  Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data , 2005, Bioinform..

[28]  Joseph T. Glessner,et al.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. , 2007, Genome research.

[29]  R. Redon,et al.  Relative Impact of Nucleotide and Copy Number Variation on Gene Expression Phenotypes , 2007, Science.

[30]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[31]  Cheng Li,et al.  Automating dChip: toward reproducible sharing of microarray data analysis , 2008, BMC Bioinformatics.

[32]  Joshua M. Korn,et al.  Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs , 2008, Nature Genetics.

[33]  S. Kajigaya,et al.  Distinctive gene expression profiles of CD34 cells from patients with myelodysplastic syndrome characterized by specific chromosomal abnormalities. , 2004, Blood.

[34]  Marc A. Attiyeh,et al.  Genomic copy number determination in cancer cells from single nucleotide polymorphism microarrays based on quantitative genotyping corrected for aneuploidy. , 2008, Genome research.

[35]  Marco A. Marra,et al.  Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data , 2007, BMC Bioinformatics.

[36]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[37]  Marc A. Attiyeh,et al.  Erratum: Genomic copy number determination in cancer cells from single nucleotide polymorphism microarrays based on quantitative genotyping corrected for aneuploidy (Genome Research (2009) 19 (276-283)) , 2009 .

[38]  Luc Girard,et al.  An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. , 2004, Cancer research.

[39]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[40]  E. Thiel,et al.  Therapeutic spectrum in the treatment of myelodysplastic syndromes , 2004, Expert opinion on pharmacotherapy.