Automated single-nucleotide polymorphism analysis using fluorescence excitation–emission spectroscopy and one-class classifiers

AbstractWe have developed a new method of highly automated SNP (single nucleotide polymorphism) analysis for identification of genotypes. The data were generated by the Taqman reaction. A total of 18 half-plates were analysed for different genes, each consisting of 48 wells, including six synthetic DNA samples, three background samples, and 39 human DNA samples. Fluorescence spectra were obtained from each well. The characteristics of the spectra depended on whether the genotype originated from one of three classes—homozygotic wild-type, mutant, or heterozygote. The main problems are: (1) spectral variation from one half-plate to another is sometimes very substantial; (2) the spectra of heterozygotic samples vary substantially; (3) outliers are common; and (4) not all possible alleles are represented on each half-plate so the number of types of spectra can vary, depending on the gene being analysed. We solved these problems by using a signal-standardisation technique (piecewise direct standardisation, PDS) and then built two one-class classifiers based on PCA models (PCA data description) to identify the two types of homozygote. The remaining samples were tested to see whether they could be approximated well by a linear combination of the spectra of two types of homozygote. If they could, they were identified as heterozygotic; if not, they were identified as outliers. The results are characterised by very low false-positive errors and 2 to 6% overall false-negative errors. FigurePrincipal components scores after piecewise Direct Standardisation

[1]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[2]  C. R. Connell,et al.  Allelic discrimination by nick-translation PCR with fluorogenic probes. , 1993, Nucleic acids research.

[3]  John A. Todd,et al.  Towards fully automated genome–wide polymorphism screening , 1995, Nature Genetics.

[4]  Gunter Ritter,et al.  Outliers in statistical pattern recognition and an application to automatic chromosome classification , 1997, Pattern Recognit. Lett..

[5]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[6]  R. Brereton,et al.  Genotyping using single nucleotide polymorphism, fluorescence spectroscopy and pattern recognition. , 2004, The Analyst.

[7]  N. Schork,et al.  Single nucleotide polymorphisms and the future of genetic epidemiology , 2000, Clinical genetics.

[8]  Richard G. Brereton,et al.  Chemometrics: Data Analysis for the Laboratory and Chemical Plant , 2003 .

[9]  Th. Förster Zwischenmolekulare Energiewanderung und Fluoreszenz , 1948 .

[10]  R. Abramson,et al.  Detection of specific polymerase chain reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[12]  J. Hartley,et al.  Use of uracil DNA glycosylase to control carry-over contamination in polymerase chain reactions. , 1990, Gene.

[13]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[14]  K. Livak,et al.  Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. , 1995, PCR methods and applications.

[15]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[16]  B. Kowalski,et al.  Multivariate instrument standardization , 1991 .

[17]  Yun Xu,et al.  Diagnostic Pattern Recognition on Gene-Expression Profile Data by Using One-Class Classification , 2005, J. Chem. Inf. Model..