Highly accurate classification of Watson-Crick basepairs on termini of single DNA molecules.

We introduce a computational method for classification of individual DNA molecules measured by an alpha-hemolysin channel detector. We show classification with better than 99% accuracy for DNA hairpin molecules that differ only in their terminal Watson-Crick basepairs. Signal classification was done in silico to establish performance metrics (i.e., where train and test data were of known type, via single-species data files). It was then performed in solution to assay real mixtures of DNA hairpins. Hidden Markov Models (HMMs) were used with Expectation/Maximization for denoising and for associating a feature vector with the ionic current blockade of the DNA molecule. Support Vector Machines (SVMs) were used as discriminators, and were the focus of off-line training. A multiclass SVM architecture was designed to place less discriminatory load on weaker discriminators, and novel SVM kernels were used to boost discrimination strength. The tuning on HMMs and SVMs enabled biophysical analysis of the captured molecule states and state transitions; structure revealed in the biophysical analysis was used for better feature selection.

[1]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[4]  Patrick Gourmelon,et al.  Utility of the wavelet transform to analyze the stationarity of single ionic channel recordings , 2000, Journal of Neuroscience Methods.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[7]  S H Chung,et al.  Signal processing techniques for channel current analysis based on hidden Markov models. , 1998, Methods in enzymology.

[8]  Bart Kosko,et al.  Neural networks for signal processing , 1992 .

[9]  D. Branton,et al.  Characterization of individual polynucleotide molecules using a membrane channel. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[10]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[11]  D. Branton,et al.  Microsecond time-scale discrimination among polycytidylic acid, polyadenylic acid, and polyuridylic acid as homopolymers or as segments within single RNA molecules. , 1999, Biophysical journal.

[12]  J. Gouaux,et al.  Subunit stoichiometry of staphylococcal alpha-hemolysin in crystals and on membranes: a heptameric transmembrane pore. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Jene A. Golovchenko Solid State Nanopores for Single Molecule Detection and Characterization , 2002 .

[14]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[15]  D. Branton,et al.  Voltage-driven DNA translocations through a nanopore. , 2001, Physical review letters.

[16]  G. Stormo Gene-finding approaches for eukaryotes. , 2000, Genome research.

[18]  J. Gouaux,et al.  Structure of Staphylococcal α-Hemolysin, a Heptameric Transmembrane Pore , 1996, Science.

[19]  Hugh E. Olsen,et al.  Rapid discrimination among individual DNA hairpin molecules at single-nucleotide resolution using an ion channel , 2001, Nature Biotechnology.

[20]  B. Sakmann,et al.  Single-Channel Recording , 1995, Springer US.

[21]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[22]  Kristin P. Bennett,et al.  Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[23]  D. Branton,et al.  Rapid nanopore discrimination between single polynucleotide molecules. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[24]  D. Haussler,et al.  A hidden Markov model that finds genes in E. coli DNA. , 1994, Nucleic acids research.

[25]  T. Joachims,et al.  1 Making Large-scale Svm Learning Practical , 1999 .

[26]  S H Chung,et al.  Characterization of single channel currents using digital signal processing techniques based on Hidden Markov Models. , 1990, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[27]  J. SantaLucia,et al.  A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[28]  雛元 孝夫,et al.  ウェーブレット変換の基礎 = Wavelets made easy , 2000 .