Programming with R

Best practices for using R and designing programs

Learning Objectives

Define some best practices when working with R

  1. Start your code with a description of what it is:
#This is code to replicate the analyses and figures from my 2014 Science paper.
#Code developed by Sarah Supp, Tracy Teal, and Jon Borelli
  1. Run all of your import statments (library):
library(ggplot2)
library(reshape)
library(vegan)
  1. Set your working directory before source()ing a script, or start R inside your project folder:

One should exercise caution when using setwd(). Changing directories in your script can limit reproducibility:

  • setwd() will throw an error if the directory you’re trying to change to doesn’t exit, or the user doesn’t have the correct permissions to access it. This becomes a problem when sharing scripts between users who have organized their directories differently.
  • If/when your script terminates with an error, you might leave the user in a different directory to where they started, and if they call the script again this will cause further problems. If you must use setwd(), it is best to put it at the top of the script to avoid this problem.

The following error message indicates that R has failed to set the working directory you specified:

Error in setwd("~/path/to/working/directory") : cannot change working directory

Consider using the convention that the user running the script should begin in the relevant directory on their machine and then use relative file paths (see below).

  1. Use # or #- to set off sections of your code so you can easily scroll through it and find things.

  2. If you have only one or a few functions, put them at the top of your code, so they are among the first things run. If you have written many functions, put them all in their own .R file, and source them. Source will define all of these functions so that you can use them as you need them. For the reasons listed above, try to avoid using setwd() (or other functions that have side-effects in the user’s workspace) in scripts you source.

source("my_genius_fxns.R")
  1. Use consistent style within your code.

  2. Keep your code modular. If a single function or loop gets too long, consider breaking it into smaller pieces.

  3. Don’t repeat yourself. Automate! If you are repeating the same piece of code on multiple objects or files, use a loop or a function to do the same thing. The more you repeat yourself, the more likely you are to make a mistake.

  4. Manage all of your source files for a project in the same directory. Then use relative paths as necessary. For example, use

dat <- read.csv(file = "/files/dataset-2013-01.csv", header = TRUE)

rather than:

dat <- read.csv(file = "/Users/Karthik/Documents/sannic-project/files/dataset-2013-01.csv", header = TRUE)
  1. Don’t save a session history (the default option in R, when it asks if you want an RData file). Instead, start in a clean environment so that older objects don’t contaminate your current environment. This can lead to unexpected results, especially if the code were to be run on someone else’s machine.

  2. Where possible keep track of sessionInfo() somewhere in your project folder. Session information is invaluable since it captures all of the packages used in the current project. If a newer version of a project changes the way a function behaves, you can always go back and reinstall the version that worked (Note: At least on CRAN all older versions of packages are permanently archived).

  3. Collaborate. Grab a buddy and practice “code review”. We do it for methods and papers, why not code? Our code is a major scientific product and the result of a lot of hard work!

  4. Develop your code using version control and frequent updates!

Discussion - Best practice

  1. What other suggestions do you have?
  2. How could we restructure the code we worked on today, to make it easier to read? Discsuss with your neighbor.