Posted on Categories data science, Opinion, Programming, TutorialsTags , , , , , , , , 4 Comments on seplyr 0.5.8 Now Available on CRAN

seplyr 0.5.8 Now Available on CRAN

We are pleased to announce that seplyr version 0.5.8 is now available on CRAN.

seplyr is an R package that provides a thin wrapper around elements of the dplyr package and (now with version 0.5.8) the tidyr package. The intent is to give the part time R user the ability to easily program over functions from the popular dplyr and tidyr packages. Our assumption is always that a data scientist most often comes to R to work with data, not to tinker with the programming language itself.

Continue reading seplyr 0.5.8 Now Available on CRAN

Posted on Categories Coding, TutorialsTags , , , , Leave a comment on R Tip: Be Wary of “…”

R Tip: Be Wary of “…”

R Tip: be wary of “...“.

The following code example contains an easy error in using the R function unique().

vec1 <- c("a", "b", "c")
vec2 <- c("c", "d")
unique(vec1, vec2)
# [1] "a" "b" "c"

Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with multiple value arguments, as one would use union(). Also notice no error or warning was signaled. We used unique() incorrectly and nothing pointed this out to us. What compounded our error was R‘s “...” function signature feature.

In this note I will talk a bit about how to defend against this kind of mistake. I am going to apply the principle that a design that makes committing mistakes more difficult (or even impossible) is a good thing, and not a sign of carelessness, laziness, or weakness. I am well aware that every time I admit to making a mistake (I have indeed made the above mistake) those who claim to never make mistakes have a laugh at my expense. Honestly I feel the reason I see more mistakes is I check a lot more.

Continue reading R Tip: Be Wary of “…”

Posted on Categories Administrativia, Coding, ProgrammingTags , 1 Comment on wrapr 1.5.0 available on CRAN

wrapr 1.5.0 available on CRAN

The R package wrapr 1.5.0 is now available on CRAN.

wrapr includes a lot of tools for writing better R code:

I’ll be writing articles on a number of the new capabilities. For now I just leave you with the nifty operator coalesce notation.

Continue reading wrapr 1.5.0 available on CRAN

Posted on Categories Opinion, Programming, TutorialsTags , , , Leave a comment on wrapr 1.4.1 now up on CRAN

wrapr 1.4.1 now up on CRAN

wrapr 1.4.1 is now available on CRAN. wrapr is a really neat R package both organizing, meta-programming, and debugging R code. This update generalizes the dot-pipe feature’s dot S3 features.

Please give it a try!

Continue reading wrapr 1.4.1 now up on CRAN

Posted on Categories Coding, Opinion, TutorialsTags , , , 4 Comments on magrittr and wrapr Pipes in R, an Examination

magrittr and wrapr Pipes in R, an Examination

Let’s consider piping in R both using the magrittr package and using the wrapr package.

Continue reading magrittr and wrapr Pipes in R, an Examination

Posted on Categories Coding, Opinion, Pragmatic Data Science, Statistics, TutorialsTags , , , , , , ,

R Tip: Think in Terms of Values

R tip: first organize your tasks in terms of data, values, and desired transformation of values, not initially in terms of concrete functions or code.

I know I write a lot about coding in R. But it is in the service of supporting statistics, analysis, predictive analytics, and data science.

R without data is like going to the theater to watch the curtain go up and down.

(Adapted from Ben Katchor’s Julius Knipl, Real Estate Photographer: Stories, Little, Brown, and Company, 1996, page 72, “Excursionist Drama 2”.)

Usually you come to R to work with data. If you think and plan in terms of data and values (including introducing more data to control processing) you will usually work in much faster, explainable, and maintainable fashion.

Continue reading R Tip: Think in Terms of Values

Posted on Categories Coding, TutorialsTags , , , 4 Comments on R Tip: Use Named Vectors to Re-Map Values

R Tip: Use Named Vectors to Re-Map Values

Here is an R tip. Want to re-map a column of values? Use a named vector as the mapping.

Continue reading R Tip: Use Named Vectors to Re-Map Values

Posted on Categories Coding, Opinion, Statistics, TutorialsTags , , , , , , , 1 Comment on R Tip: Use let() to Re-Map Names

R Tip: Use let() to Re-Map Names

Another R tip. Need to replace a name in some R code or make R code re-usable? Use wrapr::let().



Continue reading R Tip: Use let() to Re-Map Names

Posted on Categories Coding, Statistics, TutorialsTags , , , , 7 Comments on R Tip: Force Named Arguments

R Tip: Force Named Arguments

R tip: force the use of named arguments when designing function signatures.

R’s named function argument binding is a great aid in writing correct programs. It is a good idea, if practical, to force optional arguments to only be usable by name. To do this declare the additional arguments after “...” and enforce that none got lost in the “... trap” by using a checker such as wrapr::stop_if_dot_args().

Example:

#' Increment x by inc.
#' 
#' @param x item to add to
#' @param ... not used for values, forces later arguments to bind by name
#' @param inc (optional) value to add
#' @return x+inc
#'
#' @examples
#'
#' f(7) # returns 8
#'
f <- function(x, ..., inc = 1) {
   wrapr::stop_if_dot_args(substitute(list(...)), "f")
   x + inc
}

f(7)
#> [1] 8

f(7, inc = 2)
#> [1] 9


f(7, q = mtcars)
#> Error: f unexpected arguments: q = mtcars

f(7, 2)
#> Error: f unexpected arguments: 2 

By R function evaluation rules: any unexpected/undeclared arguments are captured by the “...” argument. Then “wrapr::stop_if_dot_args()” inspects for such values and reports an error if there are such. The "f" string is returned as part of the error, I chose the name of the function as in this case. The “substitute(list(…))” part is R’s way of making the contents of “…” available for inspection.

You can also use the technique on required arguments. wrapr::stop_if_dot_args() is a simple low-dependency helper function intended to make writing code such as the above easier. This is under the rubric that hidden errors are worse than thrown exceptions. It is best to find and signal problems early, and near the cause.

The idea is that you should not expect a user to remember the positions of more than 1 to 3 arguments, the rest should only be referable by name. Do not make your users count along large sequences of arguments, the human brain may have special cases for small sequences.

If you have a procedure with 10 parameters, you probably missed some.

Alan Perlis, “Epigrams on Programming”, ACM SIGPLAN Notices 17 (9), September 1982, pp. 7–13.

Note that the “substitute(list(...))” part is the R idiom for capturing the unevaluated contents of “...“, I felt it best to use standard R as much a possible in favor of introducing any additional magic invocations.