Reducing system noise in copy number data using principal components of self-self hybridizations

Genomic copy number variation underlies genetic disorders such as autism, schizophrenia, and congenital heart disease. Copy number variations are commonly detected by array based comparative genomic hybridization of sample to reference DNAs, but probe and operational variables combine to create correlated system noise that degrades detection of genetic events. To correct for this we have explored hybridizations in which no genetic signal is expected, namely “self-self” hybridizations (SSH) comparing DNAs from the same genome. We show that SSH trap a variety of correlated system noise present also in sample-reference (test) data. Through singular value decomposition of SSH, we are able to determine the principal components (PCs) of this noise. The PCs themselves offer deep insights into the sources of noise, and facilitate detection of artifacts. We present evidence that linear and piecewise linear correction of test data with the PCs does not introduce detectable spurious signal, yet improves signal-to-noise metrics, reduces false positives, and facilitates copy number determination.

[1]  Boris Yamrom,et al.  Rare De Novo and Transmitted Copy-Number Variation in Autistic Spectrum Disorders , 2011, Neuron.

[2]  Kathryn Roeder,et al.  Multiple Recurrent De Novo CNVs, Including Duplications of the 7q11.23 Williams Syndrome Region, Are Strongly Associated with Autism , 2011, Neuron.

[3]  J. Leek Asymptotic Conditional Singular Value Decomposition for High‐Dimensional Genomic Data , 2011, Biometrics.

[4]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[5]  Gary D Bader,et al.  Functional impact of global rare copy number variation in autism spectrum disorders , 2010, Nature.

[6]  X. Troussard,et al.  Waved aCGH: to smooth or not to smooth , 2010, Nucleic acids research.

[7]  P. Stankiewicz,et al.  Structural variation in the human genome and its role in disease. , 2010, Annual review of medicine.

[8]  James Hadfield,et al.  The pitfalls of platform comparison: DNA copy number array technologies assessed , 2009, BMC Genomics.

[9]  Kenny Q. Ye,et al.  Sensitive and accurate detection of copy number variants using read depth of coverage. , 2009, Genome research.

[10]  M. Hurles,et al.  Copy number variation in human health, disease, and evolution. , 2009, Annual review of genomics and human genetics.

[11]  Laurent Duret,et al.  Biased gene conversion and the evolution of mammalian genomic landscapes. , 2009, Annual review of genomics and human genetics.

[12]  Robert S Illingworth,et al.  CpG islands – ‘A rough guide’ , 2009, FEBS letters.

[13]  B. Lakshmi,et al.  Novel genomic alterations and clonal evolution in chronic lymphocytic leukemia revealed by representational oligonucleotide microarray analysis (ROMA). , 2009, Blood.

[14]  Masatoshi Nei,et al.  The evolution of animal chemosensory receptor gene repertoires: roles of chance and necessity , 2008, Nature Reviews Genetics.

[15]  D. Pinto,et al.  Structural variation of chromosomes in autism spectrum disorder. , 2008, American journal of human genetics.

[16]  Tomas W. Fitzgerald,et al.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization , 2007, Genome Biology.

[17]  Fernando A. Villanea,et al.  Diet and the evolution of human amylase gene copy number variation , 2007, Nature Genetics.

[18]  Martin Vingron,et al.  Effects of Long-Range Correlations in DNA on Sequence Alignment Score Statistics , 2007, J. Comput. Biol..

[19]  Kenny Q. Ye,et al.  Strong Association of De Novo Copy Number Mutations with Autism , 2007, Science.

[20]  Leming Shi,et al.  Self-self hybridization as an alternative experiment design to dye swap for two-color microarrays. , 2007, Omics : a journal of integrative biology.

[21]  Emmanuel Barillot,et al.  Spatial normalization of array-CGH data , 2006, BMC Bioinformatics.

[22]  Rabab Kreidieh Ward,et al.  BMC Bioinformatics Methodology article A stepwise framework for the normalization of array CGH data , 2005 .

[23]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[24]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[25]  Kenny Q. Ye,et al.  Large-Scale Copy Number Polymorphism in the Human Genome , 2004, Science.

[26]  Renée X de Menezes,et al.  Genomic profiling by DNA amplification of laser capture microdissected tissues and array CGH. , 2004, Nucleic acids research.

[27]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[28]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[29]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[30]  G. Bernardi,et al.  The human genome: organization and evolutionary history. , 1995, Annual review of genetics.

[31]  I. Jolliffe Principal Component Analysis , 2005 .

[32]  G Bernardi,et al.  The mosaic genome of warm-blooded vertebrates. , 1985, Science.

[33]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .