SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis

The seqinR package for the R environment is a library of utilities to retrieve and analyse biological sequences. It provides an interface between i) the R language and environment for statistical computing and graphics and ii) the ACNUC sequence retrieval system for nucleotide and protein sequence databases such as GenBank, EMBL, SWISS-PROT. ACNUC is very efficient in providing direct access to subsequences of biological interest (e.g. protein coding regions, tRNA or rRNA coding regions) present in GenBank and in EMBL. Thanks to a simple query language, it is then easy under R to select sequences of interest and then use all the power of the R environment to analyze them. The ACNUC databases can be locally installed but they are more conveniently accessed through a web server to take advantage of centralized daily updates. The aim of this paper is to provide a handout on basic sequence analyses under seqinR with a special focus on multivariate methods.

[1]  H. Munro,et al.  Mammalian protein metabolism , 1964 .

[2]  E. Chargaff,et al.  Separation of microbial deoxyribonucleic acids into complementary strands. , 1969, Proceedings of the National Academy of Sciences of the United States of America.

[3]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[4]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[5]  C. Gautier Analyse statistique et évolution des séquences d'acides nucléiques , 1987 .

[6]  C. Gautier,et al.  Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. , 1994, Nucleic acids research.

[7]  A. Antoniadis,et al.  Wavelets and Statistics , 1995 .

[8]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[9]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[10]  J. R. Lobry,et al.  Oriloc: prediction of replication boundaries in unannotated bacterial chromosomes , 2000, Bioinform..

[11]  N. Sueoka,et al.  Asymmetric directional mutation pressures in bacteria , 2002, Genome Biology.

[12]  G. Perrière,et al.  Use and misuse of correspondence analysis in codon usage studies. , 2002, Nucleic acids research.

[13]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[14]  Ric,et al.  A Statistical Test for Host–Parasite Coevolution , 2002 .

[15]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[16]  D. Chessel,et al.  Internal correspondence analysis of codon and amino-acid usage in thermophilic bacteria. , 2003, Journal of applied genetics.

[17]  S. Cebrat,et al.  Where does bacterial replication start? Rules for predicting the oriC region. , 2004, Nucleic acids research.

[18]  Jean R. Lobry,et al.  Life History Traits and Genome Structure: Aerobiosis and G+C Content in Bacteria , 2004, International Conference on Computational Science.

[19]  Guy Perrière,et al.  Online synonymous codon usage analyses with the ade4 and seqinR packages , 2005, Bioinform..

[20]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[21]  Wen-Hsiung Li Unbiased estimation of the rates of synonymous and nonsynonymous substitution , 2006, Journal of Molecular Evolution.