论文信息 - SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis - 字舞流文

SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis

The seqinR package for the R environment is a library of utilities to retrieve and analyse biological sequences. It provides an interface between i) the R language and environment for statistical computing and graphics and ii) the ACNUC sequence retrieval system for nucleotide and protein sequence databases such as GenBank, EMBL, SWISS-PROT. ACNUC is very efficient in providing direct access to subsequences of biological interest (e.g. protein coding regions, tRNA or rRNA coding regions) present in GenBank and in EMBL. Thanks to a simple query language, it is then easy under R to select sequences of interest and then use all the power of the R environment to analyze them. The ACNUC databases can be locally installed but they are more conveniently accessed through a web server to take advantage of centralized daily updates. The aim of this paper is to provide a handout on basic sequence analyses under seqinR with a special focus on multivariate methods.

J. R. Lobry | Delphine Charif | D. Charif | J. Lobry

[1] H. Munro,et al. Mammalian protein metabolism , 1964 .

[2] E. Chargaff,et al. Separation of microbial deoxyribonucleic acids into complementary strands. , 1969, Proceedings of the National Academy of Sciences of the United States of America.

[3] T. Jukes. CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[4] N. Saitou,et al. The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[5] C. Gautier. Analyse statistique et évolution des séquences d'acides nucléiques , 1987 .

[6] C. Gautier,et al. Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. , 1994, Nucleic acids research.

[7] A. Antoniadis,et al. Wavelets and Statistics , 1995 .

[8] David L. Donoho,et al. WaveLab and Reproducible Research , 1995 .

[9] Ross Ihaka,et al. Gentleman R: R: A language for data analysis and graphics , 1996 .

[10] J. R. Lobry,et al. Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes , 2000, Bioinform..

[11] N. Sueoka,et al. Asymmetric directional mutation pressures in bacteria , 2002, Genome Biology.

[12] G. Perrière,et al. Use and misuse of correspondence analysis in codon usage studies. , 2002, Nucleic acids research.

[13] Friedrich Leisch,et al. Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[14] Ric,et al. A Statistical Test for Host–Parasite Coevolution , 2002 .

[15] L. Hurst. The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[16] D. Chessel,et al. Internal correspondence analysis of codon and amino-acid usage in thermophilic bacteria. , 2003, Journal of applied genetics.

[17] S. Cebrat,et al. Where does bacterial replication start? Rules for predicting the oriC region. , 2004, Nucleic acids research.

[18] Jean R. Lobry,et al. Life History Traits and Genome Structure: Aerobiosis and G+C Content in Bacteria , 2004, International Conference on Computational Science.

[19] Guy Perrière,et al. Online synonymous codon usage analyses with the ade4 and seqinR packages , 2005, Bioinform..

[20] M. Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[21] Wen-Hsiung Li. Unbiased estimation of the rates of synonymous and nonsynonymous substitution , 2006, Journal of Molecular Evolution.