R is definitely our first choice go-to analysis system. In our opinion you really shouldn’t use something else until you have an articulated reason (be it a need for larger data scale, different programming language, better data source integration, or something else). The advantages of R are numerous:
- Single integrated work environment.
- Powerful unified scripting/programming environment.
- Many many good tutorials and books available.
- Wide range of machine learning and statistical libraries.
- Very solid standard statistical libraries.
- Excellent graphing/plotting/visualization facilities (especially ggplot2).
- Schema oriented data frames allowing batch operations, plus simple row and column manipulation.
- Unified treatment of missing values (regardless of type).
For all that we always end up feeling just a little worried and a little guilty when introducing a new user to R. R is very powerful and often has more than one way to perform a common operation or represent a common data type. So you are never very far away from a strange and painful corner case. This why when you get R training you need to make sure you get an R expert (and not an R apologist). One of my favorite very smart experts is Norm Matloff (even his most recent talk title is smart: “What no one else will tell you about R”). Also, buy his book; we are very happy we purchased it.
But back to corner cases. For each method in R you really need to double check if it actually works over the common R base data types (numeric, integer, character, factor, and logical). Not all of them do and and sometimes you get a surprise.
Recent corner case problems we ran into include:
- randomForest regression fails on character arguments, but works on factors.
gam()model doesn’t convert strings to formulas.
- R maps can’t use the empty string as a key (that is the string of length 0, not a
These are all little things, but can be a pain to debug when you are in the middle of something else. Read more…