`seplyr`

is an `R`

package that makes it easy to program over `dplyr`

`0.7.*`

.

To illustrate this we will work an example.

Continue reading Tutorial: Using seplyr to Program Over dplyr

Skip to content
# Category: Opinion

Posted on Categories Coding, data science, Opinion, Programming, Statistics, Tutorials13 Comments on Tutorial: Using seplyr to Program Over dplyr## Tutorial: Using seplyr to Program Over dplyr

Posted on Categories data science, Opinion, Programming, Statistics, Tutorials12 Comments on dplyr 0.7 Made Simpler## dplyr 0.7 Made Simpler

Posted on Categories Opinion, Programming, Statistics, Tutorials8 Comments on In praise of syntactic sugar## In praise of syntactic sugar

Posted on Categories data science, Opinion, Statistics2 Comments on Working With R and Big Data: Use Replyr## Working With R and Big Data: Use Replyr

# Why

Posted on Categories Opinion, Programming, Statistics2 Comments on Using wrapr::let() with tidyeval## Using wrapr::let() with tidyeval

Posted on Categories Opinion, Programming, Statistics5 Comments on Please Consider Using wrapr::let() for Replacement Tasks## Please Consider Using wrapr::let() for Replacement Tasks

Posted on Categories Coding, data science, Opinion, Programming, Statistics, Tutorials10 Comments on Non-Standard Evaluation and Function Composition in R## Non-Standard Evaluation and Function Composition in R

Posted on Categories Opinion, Rants, Statistics, Tutorials1 Comment on An easy way to accidentally inflate reported R-squared in linear regression models## An easy way to accidentally inflate reported R-squared in linear regression models

Posted on Categories Administrativia, Opinion, Statistics## Campaign Response Testing no longer published on Udemy

Posted on Categories Coding, Opinion, Programming, Statistics2 Comments on More on safe substitution in R## More on safe substitution in R

`seplyr`

is an `R`

package that makes it easy to program over `dplyr`

`0.7.*`

.

To illustrate this we will work an example.

Continue reading Tutorial: Using seplyr to Program Over dplyr

I have been writing *a lot* (too much) on the `R`

topics `dplyr`

/`rlang`

/`tidyeval`

lately. The reason is: major changes were recently announced. If you are going to use `dplyr`

well and correctly going forward you may need to understand some of the new issues (if you don’t use `dplyr`

you can safely skip all of this). I am trying to work out (publicly) how to best incorporate the new methods into:

- real world analyses,
- reusable packages,
- and teaching materials.

I think some of the apparent discomfort on my part comes from my feeling that `dplyr`

never really gave standard evaluation (SE) a fair chance. In my opinion: `dplyr`

is based strongly on non-standard evaluation (NSE, originally through `lazyeval`

and now through `rlang`

/`tidyeval`

) more by the taste and choice than by actual analyst benefit or need. `dplyr`

isn’t my package, so it isn’t my choice to make; but I can still have an informed opinion, which I will discuss below.

There has been some talk of adding native pipe notation to R (for example here, here, and here). And even a `tidyeval`

/`rlang`

pipe here.

I think a critical aspect of such an extension would be to treat such a notation as *syntactic sugar* and *not* insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics. Continue reading In praise of syntactic sugar

In our latest R and Big Data article we discuss replyr.

`replyr`

`replyr`

stands for **RE**mote **PLY**ing of big data for **R**.

Why should R users try `replyr`

? Because it lets you take a number of common working patterns and apply them to remote data (such as databases or `Spark`

).

`replyr`

allows users to work with `Spark`

or database data similar to how they work with local `data.frame`

s. Some key capability gaps remedied by `replyr`

include:

- Summarizing data:
`replyr_summary()`

. - Combining tables:
`replyr_union_all()`

. - Binding tables by row:
`replyr_bind_rows()`

. - Using the split/apply/combine pattern (
`dplyr::do()`

):`replyr_split()`

,`replyr::gapply()`

. - Pivot/anti-pivot (
`gather`

/`spread`

):`replyr_moveValuesToRows()`

/`replyr_moveValuesToColumns()`

. - Handle tracking.
- A join controller.

You may have already learned to decompose your local data processing into steps including the above, so retaining such capabilities makes working with `Spark`

and `sparklyr`

*much* easier. Some of the above capabilities will likely come to the `tidyverse`

, but the above implementations are build purely on top of `dplyr`

and are the ones already being vetted and debugged at production scale (I think these will be ironed out and reliable sooner).

While going over some of the discussion related to my last post I came up with a really neat way to use `wrapr::let()`

and `rlang`

/`tidyeval`

together.

Please read on to see the situation and example. Continue reading Using wrapr::let() with tidyeval

From `dplyr`

issue 2916.

The following *appears* to work.

```
suppressPackageStartupMessages(library("dplyr"))
COL <- "homeworld"
starwars %>%
group_by(.data[[COL]]) %>%
head(n=1)
```

```
## # A tibble: 1 x 14
## # Groups: COL [1]
## name height mass hair_color skin_color eye_color birth_year
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 Luke Skywalker 172 77 blond fair blue 19
## # ... with 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
## # films <list>, vehicles <list>, starships <list>, COL <chr>
```

Though notice it reports the grouping is by "`COL`

", not by "`homeworld`

". Also the data set now has `14`

columns, not the original `13`

from the `starwars`

data set.

Continue reading Please Consider Using wrapr::let() for Replacement Tasks

In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in `R`

.

In `R`

the package `tidyeval`

/`rlang`

is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.

To use it you must know some of its structure and notation. Here are some details paraphrased from the major `tidyeval`

/`rlang`

client, the package dplyr: `vignette('programming', package = 'dplyr')`

).

- "
`:=`

" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue). - "
`!!`

" substitution requires parenthesis to safely bind (so the notation is actually "`(!! )`

", not "`!!`

"). - Left-hand-sides of expressions are names or strings, while right-hand-sides are
`quosures`

/expressions.

Continue reading Non-Standard Evaluation and Function Composition in R

Here is an absolutely *horrible* way to confuse yourself and get an inflated reported `R-squared`

on a simple linear regression model in `R`

.

We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. Continue reading An easy way to accidentally inflate reported R-squared in linear regression models

Our free video course Campaign Response Testing is no longer published on Udemy. It remains available for free on YouTube with all source code available from GitHub. I’ll try to correct bad links as I find them.

Please read on for the reasons. Continue reading Campaign Response Testing no longer published on Udemy

Let’s worry a bit about substitution in `R`

. Substitution is very powerful, which means it can be both used and mis-used. However, that does not mean every use is unsafe or a mistake.