Posted on Categories Coding, TutorialsTags , , 6 Comments on R Tip: Use seq_len() to Avoid The Backwards Sequence Trap

R Tip: Use seq_len() to Avoid The Backwards Sequence Trap

Another R tip. Use seq_len() to avoid the backwards sequence trap.

Many R users use the “colon sequence” notation to build sequences. For example:

for(i in 1:5) {
  print(paste(i, i*i))
}
#> [1] "1 1"
#> [1] "2 4"
#> [1] "3 9"
#> [1] "4 16"
#> [1] "5 25"

However, the colon notation can be unsafe as it does not properly handle the empty sequence case:

n <- 0

1:n
#> [1] 1 0

Notice the above example built a reversed sequence, instead of an empty sequence.

This leads to the backwards sequence trap: writing code of the form “1:length(x)” is often wrong. For example “for(i in 1:length(x)) { statements involving x[[i]] }“, which will fail for length-zero x.

To avoid this use seq_len():

seq_len(5)
#> [1] 1 2 3 4 5

n <- 0
seq_len(n)
#> integer(0)

integer(0)” is a length zero sequence of integers (not a sequence containing the value zero).

Posted on Categories Coding, StatisticsTags , , , , 1 Comment on R Tip: Use qc() For Fast Legible Quoting

R Tip: Use qc() For Fast Legible Quoting

Here is an R tip. Need to quote a lot of names at once? Use qc().

This is particularly useful in selecting columns from data.frames:

library("wrapr")  # get qc() definition

head(mtcars[, qc(mpg, cyl, wt)])

#                    mpg cyl    wt
# Mazda RX4         21.0   6 2.620
# Mazda RX4 Wag     21.0   6 2.875
# Datsun 710        22.8   4 2.320
# Hornet 4 Drive    21.4   6 3.215
# Hornet Sportabout 18.7   8 3.440
# Valiant           18.1   6 3.460

Or even to install many packages at once:

install.packages(qc(vtreat, cdata, WVPlots))
# shorter than the alternative:
#  install.packages(c("vtreat", "cdata", "WVPlots"))
Posted on Categories data science, Opinion, Statistics, TutorialsTags , , , , , Leave a comment on We Want to be Playing with a Moderate Number of Powerful Blocks

We Want to be Playing with a Moderate Number of Powerful Blocks

Many data scientists (and even statisticians) often suffer under one of the following misapprehensions:

  • They believe a technique doesn’t work in their current situation (when in fact it does), leading to useless precautions and missed opportunities.
  • They believe a technique does work in their current situation (when in fact it does not), leading to failed experiments or incorrect results.

I feel this happens less often if you are working with observable and composable tools of the proper scale. Somewhere between monolithic all in one systems, and ad-hoc one-off coding is a cognitive sweet spot where great work can be done.

Continue reading We Want to be Playing with a Moderate Number of Powerful Blocks

Posted on Categories Coding, data science, Programming, StatisticsTags , , , , , , , 11 Comments on Is 10,000 Cells Big?

Is 10,000 Cells Big?

Trick question: is a 10,000 cell numeric data.frame big or small?

In the era of "big data" 10,000 cells is minuscule. Such data could be fit on fewer than 1,000 punched cards (or less than half a box).


Punch card

The joking answer is: it is small when they are selling you the system, but can be considered unfairly large later.

Continue reading Is 10,000 Cells Big?

Posted on Categories Computer Science, Mathematics, StatisticsTags , , , , Leave a comment on Why No Exact Permutation Tests at Scale?

Why No Exact Permutation Tests at Scale?

Here at Win-Vector LLC we like permutation tests. Our team has written on them (for example: How Do You Know if Your Data Has Signal?) and they are used to estimate significances in our sigr and WVPlots R packages. For example permutation methods are used to estimate the significance reported in the following ROC plot.

NewImage

Permutation tests have their own literature and issues (examples: Permutation, Parametric and Bootstrap Tests of Hypotheses, Springer-Verlag, NY, 1994 (3rd edition, 2005), 2, 3, and 4).

In our R packages the permutation tests are estimated by a sampling procedure, and not computed exactly (or deterministically). It turns out this is likely a necessary concession; a complete exact permutation test procedure at scale would be big news. Please read on for my comments on this issue.

Continue reading Why No Exact Permutation Tests at Scale?