Regression trees for analysis of mutational spectra in nucleotide sequences

MOTIVATION The study and comparison of mutational spectra is an important problem in molecular biology, because these spectra often reveal important features of the action of various mutagens and the functioning of repair/replication enzymes. As is known, mutability varies significantly along nucleotide sequences: mutations often concentrate at certain positions in a sequence, otherwise termed 'hotspots'. RESULTS Herein, we propose a regression analysis method based on the use of regression trees in order to analyse the influence of nucleotide context on the occurrence of such hotspots. The REGRT program developed has been tested on simulated and real mutational spectra. For the G:C-->T:A mutational spectra induced by Sn1 alkylating agents (nine spectra), the prediction accuracy was 0. 99. AVAILABILITY The REGRT program is available upon request from V.Berikov.

[1]  R. Fisher,et al.  The Logic of Inductive Inference , 1935 .

[2]  A. Cornish-Bowden Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. , 1985, Nucleic acids research.

[3]  B W Glickman,et al.  Mutation site specificity of N-nitroso-N-methyl-N-alpha-acetoxybenzylamine: a model derivative of an esophageal carcinogen. , 1988, Carcinogenesis.

[4]  B W Glickman,et al.  Influence of neighbouring base sequence on N-methyl-N'-nitro-N-nitrosoguanidine mutagenesis in the lacI gene of Escherichia coli. , 1987, Journal of molecular biology.

[5]  C. Milstein,et al.  Passenger transgenes reveal intrinsic specificity of the antibody hypermutation mechanism: clustering, polarity, and specific hot spots. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[6]  R. Schaaper,et al.  The role of the mutT gene of Escherichia coli in maintaining replication fidelity. , 1997, FEMS microbiology reviews.

[7]  N A Kolchanov,et al.  Somatic hypermutagenesis in immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis. , 1992, Biochimica et biophysica acta.

[8]  B W Glickman,et al.  Influence of neighboring base sequence on the distribution and repair of N-ethyl-N-nitrosourea-induced lesions in Escherichia coli. , 1988, Cancer research.

[9]  B W Glickman,et al.  Mutational specificities of environmental carcinogens in the lacI gene of Escherichia coli. I. The direct-acting analogue N-nitroso-N-methyl-N-alpha-acetoxymethylamine. , 1989, Carcinogenesis.

[10]  T R Skopek,et al.  DNA base changes and alkylation following in vivo exposure of Escherichia coli to N-methyl-N-nitrosourea or N-ethyl-N-nitrosourea. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Philip J. Farabaugh,et al.  Molecular basis of base substitution hotspots in Escherichia coli , 1978, Nature.

[12]  B W Glickman,et al.  Mutational specificity of alkylating agents and the influence of DNA repair , 1990, Environmental and molecular mutagenesis.

[13]  Luciano Milanesi,et al.  The subclass approach for mutational spectrum analysis: application of the SEM algorithm. , 1998, Journal of theoretical biology.

[14]  G. S. Lbov Logical Decision Rules for Automatic Discovery of Knowledge in Expert Systems Database , 1989, Int. J. Pattern Recognit. Artif. Intell..

[15]  B W Glickman,et al.  N-methyl-N'-nitro-N-nitrosoguanidine induced DNA sequence alteration; non-random components in alkylation mutagenesis. , 1990, Mutation research.

[16]  S. Benzer ON THE TOPOLOGY OF THE GENETIC FINE STRUCTURE. , 1959, Proceedings of the National Academy of Sciences of the United States of America.

[17]  W W Piegorsch,et al.  Statistical approaches for analyzing mutational spectra: some recommendations for categorical data. , 1994, Genetics.

[18]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[19]  J. Cairns,et al.  Random components in mutagenesis , 1982, Nature.

[20]  T. D. Schneider,et al.  Quantitative analysis of the relationship between nucleotide sequence and functional activity. , 1986, Nucleic acids research.

[21]  B W Glickman,et al.  N-methyl-N'-nitro-N-nitrosoguanidine-induced mutation in a RecA strain of Escherichia coli. , 1988, Mutation research.

[22]  B W Glickman,et al.  Mutational specificities of environmental carcinogens in the lacl gene of Escherichia coli. II: A host‐mediated approach to N‐nitroso‐N, N‐dimethylamine and endogenous mutagenesis in vivo , 1989, Molecular carcinogenesis.

[23]  N A Kolchanov,et al.  Somatic hypermutagenesis in immunoglobulin genes. III. Somatic mutations in the chicken light chain locus. , 1996, Biochimica et biophysica acta.