Detecting single-feature polymorphisms using oligonucleotide arrays and robustified projection pursuit

MOTIVATION Genomic DNA was hybridized to oligonucleotide microarrays to identify single-feature polymorphisms (SFP) for Arabidopsis, which has a genome size of approximately 130 Mb. However, that method does not work well for organisms such as barley, with a much larger 5200 Mb genome. In the present study, we demonstrate SFP detection using a small number of replicate datasets and complex RNA as a surrogate for barley DNA. To identify single probes defining SFPs in the data, we developed a method using robustified projection pursuit (RPP). This method first evaluates, for each probe set, the overall differentiation of signal intensities between two genotypes and then measures the contribution of the individual probes within the probe set to the overall differentiation. RESULTS RNA from whole seedlings with and without dehydration stress provided 'present' calls for approximately 75% of probe sets. Using triplicated data, among the 5% of 'present' probe sets identified as most likely to contain at least one SFP probe, at least 80% are correctly predicted. This was determined by direct sequencing of PCR amplicons derived from barley genomic DNA. Using a 5 percentile cutoff, we defined 2007 SFP probes contained in 1684 probe sets by combining three parental genotype comparisons: Steptoe versus Morex, Morex versus Barke and Oregon Wolfe Barley Dominant versus Recessive. AVAILABILITY The algorithm is available upon request from the corresponding author. CONTACT xinping.cui@ucr.edu SUPPLEMENTARY INFORMATION http://faculty.ucr.edu/~xpcui.

[1]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[2]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[3]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[4]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Wing Hung Wong,et al.  Model-based analysis of oligonucleotide arrays and issues in cDNA microarray analysis , 2003 .

[6]  R. Wing,et al.  A bacterial artificial chromosome library for barley (Hordeum vulgare L.) and the identification of clones containing putative resistance genes , 2000, Theoretical and Applied Genetics.

[7]  Rod A Wing,et al.  A New Resource for Cereal Genomics: 22K Barley GeneChip Comes of Age1 , 2004, Plant Physiology.

[8]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[9]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[10]  Felix Naef,et al.  Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Detlef Weigel,et al.  Large-scale identification of single-feature polymorphisms in complex genomes. , 2003, Genome research.

[12]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[13]  Linda Cardle,et al.  Single-feature polymorphism discovery in the barley transcriptome , 2005, Genome Biology.

[14]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[15]  T. Close,et al.  A view of plant dehydrins using antibodies specific to the carboxy terminal peptide , 1993, Plant Molecular Biology.

[16]  Daniel R. Richards,et al.  Direct allelic variation scanning of the yeast genome. , 1998, Science.

[17]  K. Aldape,et al.  A model of molecular interactions on short oligonucleotide microarrays , 2003, Nature Biotechnology.

[18]  L. Kruglyak,et al.  Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. , 2005, Genome research.

[19]  R. E. Miller,et al.  Construction and Evaluation of cDNA Libraries for Large-Scale Expressed Sequence Tag Sequencing in Wheat (Triticum aestivum L.) , 2004, Genetics.