Not a full `R`

article, but a quick note demonstrating by example the advantage of being able to collect many expressions and pack them into a single `extend_se()`

node.

# Category: Tutorials

## seplyr 0.5.8 Now Available on CRAN

We are pleased to announce that seplyr version 0.5.8 is now available on CRAN.

seplyr is an R package that provides a thin wrapper around elements of the dplyr package and (now with version 0.5.8) the tidyr package. The intent is to give the part time R user the ability to easily program over functions from the popular dplyr and tidyr packages. Our assumption is always that a data scientist most often comes to R to work with data, not to tinker with the programming language itself.

## R Tip: Be Wary of “…”

R Tip: be wary of “`...`

“.

The following code example contains an easy error in using the R function `unique()`

.

vec1 <- c("a", "b", "c") vec2 <- c("c", "d") unique(vec1, vec2) # [1] "a" "b" "c"

Notice none of the novel values from `vec2`

are present in the result. Our mistake was: we (improperly) tried to use `unique()`

with multiple value arguments, as one would use `union()`

. Also notice no error or warning was signaled. We used `unique()`

incorrectly and nothing pointed this out to us. What compounded our error was `R`

‘s “`...`

” function signature feature.

In this note I will talk a bit about how to defend against this kind of mistake. I am going to apply the principle that a design that makes committing mistakes more difficult (or even impossible) is a good thing, and not a sign of carelessness, laziness, or weakness. I am well aware that every time I admit to making a mistake (I have indeed made the above mistake) those who claim to never make mistakes have a laugh at my expense. Honestly I feel the reason I *see* more mistakes is I check a lot more.

## R Tip: use isTRUE()

R Tip: use `isTRUE()`

.

A lot of R functions are *type unstable*, which means they return different types or classes depending on details of their values.

For example consider `all.equal()`

, it returns the logical value `TRUE`

when the items being compared are equal:

all.equal(1:3, c(1, 2, 3)) # [1] TRUE

However, when the items being compared are not equal `all.equal()`

instead returns a message:

all.equal(1:3, c(1, 2.5, 3)) # [1] "Mean relative difference: 0.25"

This can be inconvenient in using functions similar to `all.equal()`

as tests in `if()`

-statements and other program control structures.

The saving functions is `isTRUE()`

. `isTRUE()`

returns `TRUE`

if its argument *value* is equivalent to `TRUE`

, and returns `FALSE`

otherwise. `isTRUE()`

makes `R`

programming much easier.

## rqdatatable: rquery Powered by data.table

`rquery`

is an `R`

package for specifying data transforms using piped Codd-style operators. It has already shown great performance on `PostgreSQL`

and `Apache Spark`

. `rqdatatable`

is a new package that supplies a screaming fast implementation of the `rquery`

system in-memory using the `data.table`

package.

`rquery`

is already *one of* the *fastest* and *most teachable* (due to deliberate conformity to Codd’s influential work) tools to wrangle data on databases and big data systems. And now `rquery`

is also *one of* the fastest methods to wrangle data in-memory in `R`

(thanks to `data.table`

, via a thin adaption supplied by `rqdatatable`

).

## Talking about clinical significance

In statistical work in the age of big data we often get hung up on differences that are statistically significant (reliable enough to show up again and again in repeated measurements), but clinically insignificant (visible in aggregation, but too small to make any real difference to individuals).

An example would be: a diet that changes individual weight by an ounce on average with a standard deviation of a pound. With a large enough population the diet is statistically significant. It could also be used to shave an ounce off a national average weight. But, for any one individual: this diet is largely pointless.

The concept is teachable, but we have always stumbled of the naming “statistical significance” versus “practical clinical significance.”

I am suggesting trying the word “substantial” (and its antonym “insubstantial”) to describe if changes are physically small or large.

This comes down to having to remind people that “p-values are not effect sizes”. In this article we recommended reporting three statistics: a units-based effect size (such as expected delta pounds), a dimensionless effects size (such as Cohen’s d), and a reliability of experiment size measure (such as a statistical significance, which at best measures only one possible risk: re-sampling risk).

The merit is: if we don’t confound different meanings, we may be less confusing. A downside is: some of these measures are a bit technical to discuss. I’d be interested in hearing opinions and about teaching experiences along these distinctions.

## WVPlots now at version 1.0.0 on CRAN!

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. We are excited to announce the WVPlots is now at version 1.0.0 on CRAN!

## wrapr 1.4.1 now up on CRAN

`wrapr 1.4.1`

is now available on CRAN. `wrapr`

is a really neat `R`

package both organizing, meta-programming, and debugging R code. This update generalizes the dot-pipe feature’s dot S3 features.

Please give it a try!

## Ready Made Plots make Work Easier

A while back Simon Jackson and Kara Woo shared some great ideas and graphs on grouped bar charts and density plots (link). Win-Vector LLC‘s Nina Zumel just added a graph of this type to the development version of WVPlots.

Nina has, as usual, some great documentation here.

## Upcoming speaking engagments

I have a couple of public appearances coming up soon.

- The East Bay R Language Beginners Group: Preparing Datasets – The Ugly Truth & Some Solutions, Tuesday, May 1, 2018 at Robert Half Technologies, 1999 Harrison Street, Oakland, CA, 94612.
- Official May 2018 BARUG Meeting: rquery: a Query Generator for Working With SQL Data, Tuesday, May 8, 2018 at Intuit, Building 20

2600 Marine Way ยท Mountain View, CA.