Recently I noticed that the R
package sparklyr
had the following odd behavior:
suppressPackageStartupMessages(library("dplyr"))
library("sparklyr")
packageVersion("dplyr")
#> [1] '0.7.2.9000'
packageVersion("sparklyr")
#> [1] '0.6.2'
packageVersion("dbplyr")
#> [1] '1.1.0.9000'
sc <- spark_connect(master = 'local')
#> * Using Spark: 2.1.0
d <- dplyr::copy_to(sc, data.frame(x = 1:2))
dim(d)
#> [1] NA
ncol(d)
#> [1] NA
nrow(d)
#> [1] NA
This means user code or user analyses that depend on one of dim()
, ncol()
or nrow()
possibly breaks. nrow()
used to return something other than NA
, so older work may not be reproducible.
In fact: where I actually noticed this was deep in debugging a client project (not in a trivial example, such as above).

Tron: fights for the users.
In my opinion: this choice is going to be a great source of surprises, unexpected behavior, and bugs going forward for both sparklyr
and dbplyr
users. Continue reading Why to use the replyr R package