Posted on Categories Programming, TutorialsTags , , , Leave a comment on Piping is Method Chaining

Piping is Method Chaining

What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called “method chaining” since the days of Smalltalk (~1972). Let’s take a look at method chaining in Python, in terms of pipe notation.

Continue reading Piping is Method Chaining

Posted on Categories Computer Science, Programming, Public Service Article, TutorialsTags , , , , , 13 Comments on Using closures as objects in R

Using closures as objects in R

For more and more clients we have been using a nice coding pattern taught to us by Garrett Grolemund in his book Hands-On Programming with R: make a function that returns a list of functions. This turns out to be a classic functional programming techique: use closures to implement objects (terminology we will explain).

It is a pattern we strongly recommend, but with one caveat: it can leak references similar to the manner described in here. Once you work out how to stomp out the reference leaks the “function that returns a list of functions” pattern is really strong.

We will discuss this programming pattern and how to use it effectively. Continue reading Using closures as objects in R

Posted on Categories Opinion, Practical Data Science, Rants, StatisticsTags , , , 2 Comments on R has some sharp corners

R has some sharp corners

R is definitely our first choice go-to analysis system. In our opinion you really shouldn’t use something else until you have an articulated reason (be it a need for larger data scale, different programming language, better data source integration, or something else). The advantages of R are numerous:

  • Single integrated work environment.
  • Powerful unified scripting/programming environment.
  • Many many good tutorials and books available.
  • Wide range of machine learning and statistical libraries.
  • Very solid standard statistical libraries.
  • Excellent graphing/plotting/visualization facilities (especially ggplot2).
  • Schema oriented data frames allowing batch operations, plus simple row and column manipulation.
  • Unified treatment of missing values (regardless of type).

For all that we always end up feeling just a little worried and a little guilty when introducing a new user to R. R is very powerful and often has more than one way to perform a common operation or represent a common data type. So you are never very far away from a strange and painful corner case. This why when you get R training you need to make sure you get an R expert (and not an R apologist). One of my favorite very smart experts is Norm Matloff (even his most recent talk title is smart: “What no one else will tell you about R”). Also, buy his book; we are very happy we purchased it.

But back to corner cases. For each method in R you really need to double check if it actually works over the common R base data types (numeric, integer, character, factor, and logical). Not all of them do and and sometimes you get a surprise.

Recent corner case problems we ran into include:

  • randomForest regression fails on character arguments, but works on factors.
  • mgcv gam() model doesn’t convert strings to formulas.
  • R maps can’t use the empty string as a key (that is the string of length 0, not a NULL array or NA value).

These are all little things, but can be a pain to debug when you are in the middle of something else. Continue reading R has some sharp corners