A HIDDEN MARKOV MODEL FOR JOINT ESTIMATION OF GENOTYPE AND COPY NUMBER IN HIGH-THROUGHPUT SNP CHIPS

Amplifications and deletions of chromosomal DNA, as well as copy-neutral loss of heterozygosity have been associated with diseases processes. High-throughput single nucleotide polymorphism (SNP) arrays are useful for making genome-wide estimates of copy number and genotype calls. Because neighboring SNPs in high throughput SNP arrays are likely to have dependent copy number and genotype due to the underlying haplotype structure and linkage disequilibrium, hidden Markov models (HMM) may be useful for improving genotype calls and copy number estimates that do not incorporate information from nearby SNPs. We improve previous approaches that utilize a HMM framework for inference in high throughput SNP arrays by integrating copy number, genotype calls, and the corresponding confidence scores when available. Using simulated data, we demonstrate how confidence scores control smoothing in a probabilistic framework. Software for fitting HMMs to SNP array data is available in the R package ICE.

[1]  M Bobrow,et al.  Microarray based comparative genomic hybridisation (array-CGH) detects submicroscopic chromosomal deletions and duplications in patients with learning disability/mental retardation and dysmorphic features , 2004, Journal of Medical Genetics.

[2]  T. P. Dryja,et al.  Expression of recessive alleles by chromosomal mechanisms in retinoblastoma , 1983, Nature.

[3]  M A Newton,et al.  On the statistical analysis of allelic-loss data. , 1998, Statistics in medicine.

[4]  Luc Girard,et al.  An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. , 2004, Cancer research.

[5]  Rameen Beroukhim,et al.  Single nucleotide polymorphism array analysis of cancer , 2007, Current opinion in oncology.

[6]  Jing Huang,et al.  Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays , 2005, Bioinform..

[7]  Tao Huang,et al.  Detection of DNA copy number alterations using penalized least squares regression , 2005, Bioinform..

[8]  Wing Hung Wong,et al.  Inferring Loss-of-Heterozygosity from Unpaired Tumors Using High-Density Oligonucleotide SNP Arrays , 2006, PLoS Comput. Biol..

[9]  S. P. Fodor,et al.  Large-scale genotyping of complex DNA , 2003, Nature Biotechnology.

[10]  L. Chin,et al.  High-resolution characterization of the pancreatic adenocarcinoma genome , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Ajay N. Jain,et al.  Hidden Markov models approach to the analysis of array CGH data , 2004 .

[12]  Jing Huang,et al.  CARAT: A novel method for allelic detection of DNA copy number changes using high density oligonucleotide arrays , 2006, BMC Bioinformatics.

[13]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[14]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[15]  Cheryl I. P. Lee,et al.  Wavelet transformations of tumor expression profiles reveals a pervasive genome-wide imprinting of aneuploidy on the cancer transcriptome. , 2005, Cancer research.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Shigeru Chiba,et al.  A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. , 2005, Cancer research.

[18]  Cheng Li,et al.  dChipSNP: significance curve and clustering of SNP-array-based loss-of-heterozygosity data , 2004, Bioinform..

[19]  Wenyi Wang,et al.  Estimating Genome-Wide Copy Number Using Allele-Specific Mixture Models , 2008, J. Comput. Biol..

[20]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[21]  Kevin P. Murphy,et al.  Integrating copy number polymorphisms into array CGH analysis using a robust HMM , 2006, ISMB.

[22]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[23]  Terence P. Speed,et al.  Genome analysis A genotype calling algorithm for affymetrix SNP arrays , 2005 .

[24]  David Harrington,et al.  PLASQ: a generalized linear model-based procedure to determine allelic dosage in cancer cells from SNP array data. , 2007, Biostatistics.