Posted on Categories Opinion, Programming, TutorialsTags , 11 Comments on Programming Over lm() in R

Programming Over lm() in R

Here is simple modeling problem in R.

We want to fit a linear model where the names of the data columns carrying the outcome to predict (y), the explanatory variables (x1, x2), and per-example row weights (wt) are given to us as string values in variables.

Continue reading Programming Over lm() in R

Posted on Categories data science, Opinion, Pragmatic Data Science, Programming, StatisticsTags , , , 3 Comments on Data Manipulation Corner Cases

Data Manipulation Corner Cases

Let’s try some "ugly corner cases" for data manipulation in R. Corner cases are examples where the user might be running to the edge of where the package developer intended their package to work, and thus often where things can go wrong.

Let’s see what happens when we try to stick a fork in the power-outlet.

Fork

Continue reading Data Manipulation Corner Cases

Posted on Categories Opinion, Programming, RantsTags , 2 Comments on Very Non-Standard Calling in R

Very Non-Standard Calling in R

Our group has done a lot of work with non-standard calling conventions in R.

Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in R.

Please read on for a recent example.

Continue reading Very Non-Standard Calling in R

Posted on Categories Coding, TutorialsTags , , , ,

R Tip: Be Wary of “…”

R Tip: be wary of “...“.

The following code example contains an easy error in using the R function unique().

vec1 <- c("a", "b", "c")
vec2 <- c("c", "d")
unique(vec1, vec2)
# [1] "a" "b" "c"

Notice none of the novel values from vec2 are present in the result. Our mistake was: we (improperly) tried to use unique() with multiple value arguments, as one would use union(). Also notice no error or warning was signaled. We used unique() incorrectly and nothing pointed this out to us. What compounded our error was R‘s “...” function signature feature.

In this note I will talk a bit about how to defend against this kind of mistake. I am going to apply the principle that a design that makes committing mistakes more difficult (or even impossible) is a good thing, and not a sign of carelessness, laziness, or weakness. I am well aware that every time I admit to making a mistake (I have indeed made the above mistake) those who claim to never make mistakes have a laugh at my expense. Honestly I feel the reason I see more mistakes is I check a lot more.

Continue reading R Tip: Be Wary of “…”

Posted on Categories Opinion, Programming, StatisticsTags , , , 14 Comments on Neglected R Super Functions

Neglected R Super Functions

R has a lot of under-appreciated super powerful functions. I list a few of our favorites below.


6095431665 88664494f0 b

Atlas, carrying the sky. Royal Palace (Paleis op de Dam), Amsterdam.

Photo: Dominik Bartsch, CC some rights reserved.

Continue reading Neglected R Super Functions

Posted on Categories Coding, TutorialsTags , , , 4 Comments on R Tip: Use stringsAsFactors = FALSE

R Tip: Use stringsAsFactors = FALSE

R tip: use stringsAsFactors = FALSE.

R often uses a concept of factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string.


800px Sigmund Freud by Max Halberstadt cropped

It is often claimed Sigmund Freud said “Sometimes a cigar is just a cigar.”

Continue reading R Tip: Use stringsAsFactors = FALSE

Posted on Categories Coding, OpinionTags , , , , , , 4 Comments on Take Care If Trying the RPostgres Package

Take Care If Trying the RPostgres Package

Take care if trying the new RPostgres database connection package. By default it returns some non-standard types that code developed against other database drivers may not expect, and may not be ready to defend against.


Danger

Danger, Will Robinson!

Continue reading Take Care If Trying the RPostgres Package

Posted on Categories Opinion, Rants, StatisticsTags , 3 Comments on The Many Faces of R

The Many Faces of R

Some days I see R as an eclectic programming language preferred by scientists.

“Programming languages as people.”

PP2

From Leftover Salad (David Marino).

Other days I see it more like the following.

Continue reading The Many Faces of R

Posted on Categories Coding, Statistics, TutorialsTags , , , 1 Comment on R Tip: Introduce Indices to Avoid for() Class Loss Issues

R Tip: Introduce Indices to Avoid for() Class Loss Issues

Here is an R tip. Use loop indices to avoid for()-loops damaging classes.

Below is an R annoyance that occurs again and again: vectors lose class attributes when you iterate over them in a for()-loop.

d <- c(Sys.time(), Sys.time())
print(d)
#> [1] "2018-02-18 10:16:16 PST" "2018-02-18 10:16:16 PST"

for(di in d) {
  print(di)
}
#> [1] 1518977777
#> [1] 1518977777

Notice we printed numbers, not dates/times. To avoid this problem introduce an index, and loop over that, not over the vector contents.

for(ii in seq_along(d)) {
  di <- d[[ii]]
  print(di)
}
#> [1] "2018-02-18 10:16:16 PST"
#> [1] "2018-02-18 10:16:16 PST"

Continue reading R Tip: Introduce Indices to Avoid for() Class Loss Issues

Posted on Categories Coding, Programming, TutorialsTags , , , 8 Comments on R Tip: Use drop = FALSE with data.frames

R Tip: Use drop = FALSE with data.frames

Another R tip. Get in the habit of using drop = FALSE when indexing (using [ , ] on) data.frames.

NewImage

Prince Rupert’s drops (img: Wikimedia Commons)

Continue reading R Tip: Use drop = FALSE with data.frames