A Quick Guide to Teaching R Programming to Computational Biology Students

The name “R” refers to the computational environment initially created by Robert Gentleman and Robert Ihaka, similar in nature to the “S” statistical environment developed at Bell Laboratories (http://www.r-project.org/about.html) [1]. It has since been developed and maintained by a strong team of core developers (R-core), who are renowned researchers in computational disciplines. R has gained wide acceptance as a reliable and powerful modern computational environment for statistical computing and visualisation, and is now used in many areas of scientific computation. R is free software, released under the GNU General Public License; this means anyone can see all its source code, and there are no restrictive, costly licensing arrangements. One of the main reasons that computational biologists use R is the Bioconductor project (http://www.bioconductor.org), which is a set of packages for R to analyse genomic data. These packages have, in many cases, been provided by researchers to complement descriptions of algorithms in journal articles. Many computational biologists regard R and Bioconductor as fundamental tools for their research. R is a modern, functional programming language that allows for rapid development of ideas, together with object-oriented features for rigorous software development. The rich set of inbuilt functions makes it ideal for high-volume analysis or statistical simulations, and the packaging system means that code provided by others can easily be shared. Finally, it generates high-quality graphical output so that all stages of a study, from modelling/analysis to publication, can be undertaken within R. For detailed discussion of the merits of R in computational biology, see [2].

[1]  Paul Murrell,et al.  R Graphics , 2006, Computer science and data analysis series.

[2]  William N. Venables,et al.  S Programming , 2000 .

[3]  Robert M. May,et al.  Simple mathematical models with very complicated dynamics , 1976, Nature.

[4]  Laurence T. Maloney,et al.  Introduction to Probability with R , 2009 .

[5]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[6]  Thomas D. Sandry,et al.  Introductory Statistics With R , 2003, Technometrics.

[7]  Alan F. Blackwell,et al.  Programming , 1973, CSC '73.

[8]  Master Gardener,et al.  Mathematical games: the fantastic combinations of john conway's new solitaire game "life , 1970 .

[9]  Matthias Schwab,et al.  Making scientific computations reproducible , 2000, Comput. Sci. Eng..

[10]  Duncan J. Murdoch,et al.  A First Course in Statistical Programming with R , 2007 .

[11]  Donald E. Knuth,et al.  Computer programming as an art , 1974, CACM.

[12]  Michael S. Waterman,et al.  Computational Genome Analysis: An Introduction , 2007 .

[13]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[14]  S. Ellner,et al.  Dynamic Models in Biology , 2006 .

[15]  Robert Gentleman,et al.  R Programming for Bioinformatics , 2008 .

[16]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[17]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .