Why Learn R?

If you’ve opened this post, you’re clearly interested in learning R and probably don’t need much convincing, but I’ll provide a few quick reminders just to keep your morale up. If your morale is pretty good, feel free to jump to the sections where I explain How best to learn R (including installation), and Where to find good resources.

As an aside, you may be interested to know that Lai et al. 2019 report that “The number of […ecological…] studies reported using R as their primary tool in data analysis increased linearly from 11.4% in 2008 to 58.0% in 2017."

While there are many, many, many good reasons to learn R, here are my Top 5:

1. R has incredibly powerful capabilities for cleaning, analyzing and visualizing data

R is rapidly becoming the software of choice for data analysis in academia and data science because has many packages/libraries that contain functions that cover just about anything you want to do - databasing, stats, GIS, visualization, etc - All in one environment! There’s no need to go through the pain of moving data between different software packages anymore! Do it all in one seamless workflow in R!!!

There is also a huge user community and online resources providing tutorials, help fora, code, etc - see Where below for a start.

2. Your analyses are REPEATABLE and potentially REPRODUCIBLE*!!!

Because you write scripts when doing analyses in R, if you need to rerun your analysis, you need only rerun your script (as opposed to remembering what you clicked in which dropdown menu in what version of what programme etc.)

Scripts are also a MAJOR help if you need to run the same kind of analysis multiple times - you can just borrow from your old code! Similarly, if you want to perform the same analysis on a large number of datasets or data objects you can set the code to loop through them automatically - as opposed to “click here, click there, click here again, and click again… - repeat X 1000…

Your script also provides a useful “recipe” that helps you remember and explain the methods of the analysis you did! (I decided to italicize this because it should really be a separate point).

I should warn that there are a few extra steps one has to take to make sure your analysis stays repeatable (e.g. because of updates to R functions), and can be run on any computer, but this is easy to learn. Either way, you should learn it, because all science should be reproducible, firstly to clearly show your method and evidence of the correctness of your results, and secondly to enable others to make use of them.

Lastly, reproducibility is likely to be the way of the future… One of the big criticisms of peer-reviewed science is that it is too slow and not responsive to societal needs. This can change. If the question is clearly defined and straightforward, the method you apply is peer-reviewed and accepted, and the results do not need in-depth interpretation (e.g. the answer is simply “Yes/No”), the turn-around time to publication can be greatly reduced.

3. R is good for you

(and I thank Adam Wilson for these points - although he may have someone else to thank)

a) Learning a programming language can help you learn how to think logically.

A [wo]man who does not know a foreign language is ignorant of his[their] own. – Johann Wolfgang von Goethe (1749 - 1832)

b) Programming gives you access to more computer power.

The computer is incredibly fast, accurate, and stupid. [Wo]Man is unbelievably slow, inaccurate, and brilliant. The marriage of the two is a force beyond calculation. – Leo Cherne

c) I’d also highlight that once you understand one programming language, picking up another is much, much easier should you wish/need to do so.

4. If you decide to leave science (or change your field of study), R is a “soft skill” that has solid benefits in many career paths!

  • Other academic disciplines
  • Data science
  • Research analytics
  • Finance
  • etc…

5. It’s FREE!!!

(i.e. open source)

SPSS costs $99.00 USD per user per month ArcGIS???!!! - Don’t even go there…

R is also platform independent (i.e. operates on Windows, Mac, Linux, …)

How to learn R?

The learning curve…

The initial learning curve for any programming language can be incredibly steep. I have heard that R is less steep than many, but that’s not likely to make you feel any better when you’ve been banging your head against a wall for three days and still haven’t managed to get your analysis to run… (Yes, this WILL happen. Many, many times unfortunately…).

The only emotion as strong as the frustration you’ll be feeling after three days of banging your head is the sheer ecstacy of getting your code to work and seeing the outputs of your analysis! All of a sudden it will all be worth it!

My motivation…

The story I relate when people ask me if learning R is worth it is the tale of my first paper (published in 2006, so not THAT long ago…). All analyses in the paper are based on a dated molecular phylogeny of the beautiful and charismatic sedge genus Tetraria. Of course, one of the journal’s reviewers recommended we use a different method to date the phylogeny, meaning we had to rerun every singe analysis in the manuscript. While we knew that this would make no difference to the results whatsoever (and were proven to be correct of course), one cannot just ignore reviewers comments if you want to get your science published. Despite knowing exactly what needed to happen step-by-step, it took me 3 weeks to repeat all analyses in the paper - using 9 different software packages across Windows, Mac and Linux environments… A few years later, and two years after I started to learn R (NOTE: I’m still learning R, you can never say “I’ve learnt R”), I spent 3 hours writing an R function that does the entire analysis in under 5 minutes!!!

The point of the story is that while you may know how to do the analysis you want to do in Excel/SPSS/ArcGIS/Statistica/JMP/(insert just about any software here) in a few hours, and it may take you a day or three to work out how to in R, sooner or later YOU WILL ALWAYS HAVE TO REPEAT YOUR ANALYSIS - usually quite a few times too…

Here are a few key tricks

THE FIRST LAW OF LEARNING R is accepting that the first time you try to do anything it may take you a little longer to get it to work in R than whatever you’re used to. The beauty is that you can do this safe in the knowledge that you’re saving yourself a lot of time further down the line.

THE SECOND LAW OF LEARNING R is having a dataset and a tricky problem to solve or analysis to perform. This is usually where the benefits of R’s flexibility (or existing code libaries) are clear, and one is motivated to spend the time pounding away at it.

THE THIRD LAW OF LEARNING R is community!!! Fortunately, there are a large number of R addicts out there who can provide ALL of the answers you need - BUT you need to work out how to ask Google or the online Forum the right question (always check the forum rules before you add a post!). You should also find any useR groups in your neighbourhood (or start one!). Sometimes it is very difficult to clarify your question and having a human to help work it out a huge help!

THE FOURTH LAW OF LEARNING R is to read about how R works. If pounding away at the code isn’t working, it’s time to get a better idea of how R (or the package or function you’re trying to use) works and how to tell it to do what you want it to do. R will always do what you tell it to do, it’s making sure that it does (or has done) what you want it to do is the difficult bit. (NOTE: This also highlights the need to “reality check” that the output you get from R makes sense.)

THE FIFTH LAW OF LEARNING R is to never give up!!!

Where to learn R?

Here’s a quick list of resources to help get you started learning R.

  1. I provide a quick guide to Installing R in my primer on Handling Spatial Data in R, which I’ll come back to lower down.

  2. R for Data Science is a fantastic resource for learning how to programme in R, including visualization and communication, with a focus on a group of relatively new and very powerful R packages collectively known as the tidyverse. The only drawback for scientists is that it doesn’t delve into statistics and modelling very much, although it does point you to these great resources.

  3. A very thorough R resource, and great tool for learning statistics at the same time, is The R Book by Michael J. Crawley. Google it to find a copy. While it is a bit more old school and doesn’t include the latest and greatest tidyverse packages, in my opinion the first couple of chapters are a must read, giving you a great intro to how R works with objects and different data formats etc that will stand you in good stead in the long run. Do yourself a favour and take the time to work through them!

  4. There’s also a large number of other books to help get you started in R from introductory material through to methods in specific fields of research here, here and here.

  5. If books are too much for you, there are cheat sheets to help get you stared with various packages. Googling will turn up more too.

  6. There are also online R courses run by groups like swirl, DataCamp or Coursera, but I feel you’re not likely to remember much if you don’t have your own data/problem to solve.

  7. And various different sources all over the internet:

  1. I’ll reiterate. Community is a big word in R.

  2. While I wouldn’t recommend starting with trying to use R as a geographical information system (GIS), once you have a bit of experience under your belt there are a growing number of great resources for this. Have a look at my primers on Handling Spatial Data in R and the resources therein to get you started.