`seplyr`

is an `R`

package that makes it easy to program over `dplyr`

`0.7.*`

.

To illustrate this we will work an example.

Continue reading Tutorial: Using seplyr to Program Over dplyr

Skip to content
# Category: Tutorials

Posted on Categories Coding, data science, Opinion, Programming, Statistics, Tutorials13 Comments on Tutorial: Using seplyr to Program Over dplyr## Tutorial: Using seplyr to Program Over dplyr

Posted on Categories Administrativia, Exciting Techniques, Statistics, Tutorials1 Comment on seplyr update## seplyr update

Posted on Categories data science, Opinion, Programming, Statistics, Tutorials12 Comments on dplyr 0.7 Made Simpler## dplyr 0.7 Made Simpler

Posted on Categories data science, Statistics, Tutorials10 Comments on Better Grouped Summaries in dplyr## Better Grouped Summaries in dplyr

Posted on Categories Opinion, Programming, Statistics, Tutorials8 Comments on In praise of syntactic sugar## In praise of syntactic sugar

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Programming, Statistics, Tutorials1 Comment on Join Dependency Sorting## Join Dependency Sorting

Posted on Categories Coding, Programming, Statistics, TutorialsLeave a comment on wrapr Implementation Update## wrapr Implementation Update

## Introduction

Posted on Categories Coding, data science, Opinion, Programming, Statistics, Tutorials10 Comments on Non-Standard Evaluation and Function Composition in R## Non-Standard Evaluation and Function Composition in R

Posted on Categories Opinion, Rants, Statistics, Tutorials1 Comment on An easy way to accidentally inflate reported R-squared in linear regression models## An easy way to accidentally inflate reported R-squared in linear regression models

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, Tutorials4 Comments on Use a Join Controller to Document Your Work## Use a Join Controller to Document Your Work

`seplyr`

is an `R`

package that makes it easy to program over `dplyr`

`0.7.*`

.

To illustrate this we will work an example.

Continue reading Tutorial: Using seplyr to Program Over dplyr

The development version of my new `R`

package `seplyr`

is performing in practical applications with `dplyr`

`0.7.*`

*much* better than even *I* (the `seplyr`

package author) expected.

I think I have hit a very good set of trade-offs, and I have now spent significant time creating documentation and examples.

I wish there had been such a package weeks ago, and that I had started using this approach in my own client work at that time. If you are already a `dplyr`

user I *strongly* suggest trying `seplyr`

in your own analysis projects.

I have been writing *a lot* (too much) on the `R`

topics `dplyr`

/`rlang`

/`tidyeval`

lately. The reason is: major changes were recently announced. If you are going to use `dplyr`

well and correctly going forward you may need to understand some of the new issues (if you don’t use `dplyr`

you can safely skip all of this). I am trying to work out (publicly) how to best incorporate the new methods into:

- real world analyses,
- reusable packages,
- and teaching materials.

I think some of the apparent discomfort on my part comes from my feeling that `dplyr`

never really gave standard evaluation (SE) a fair chance. In my opinion: `dplyr`

is based strongly on non-standard evaluation (NSE, originally through `lazyeval`

and now through `rlang`

/`tidyeval`

) more by the taste and choice than by actual analyst benefit or need. `dplyr`

isn’t my package, so it isn’t my choice to make; but I can still have an informed opinion, which I will discuss below.

For `R`

`dplyr`

users one of the promises of the new `rlang`

/`tidyeval`

system is an improved ability to program over `dplyr`

itself. In particular to add new verbs that encapsulate previously compound steps into better self-documenting atomic steps.

Let’s take a look at this capability.

There has been some talk of adding native pipe notation to R (for example here, here, and here). And even a `tidyeval`

/`rlang`

pipe here.

I think a critical aspect of such an extension would be to treat such a notation as *syntactic sugar* and *not* insist such a pipe match magrittr semantics, or worse yet give a platform for authors to insert their own preferred ad-hoc semantics. Continue reading In praise of syntactic sugar

In our latest installment of “`R`

and big data” let’s again discuss the task of left joining many tables from a data warehouse using `R`

and a system called "a join controller" (last discussed here).

One of the great advantages to specifying complicated sequences of operations in data (rather than in code) is: it is often easier to transform and extend data. Explicit rich data beats vague convention and complicated code.

The ~~development version~~ CRAN version of our `R`

helper function `wrapr::let()`

has switched from string-based substitution to abstract syntax tree based substitution (AST based substitution, or language based substitution).

I am looking for some feedback from `wrapr::let()`

users already doing substantial work with `wrapr::let()`

. If you are already using `wrapr::let()`

please test if the current development version of `wrapr`

works with your code. If you run into problems: I apologize, and please file a `GitHub`

issue.

In this article we will discuss composing standard-evaluation interfaces (SE: parametric, referentially transparent, or “looks only at values”) and composing non-standard-evaluation interfaces (NSE) in `R`

.

In `R`

the package `tidyeval`

/`rlang`

is a tool for building domain specific languages intended to allow easier composition of NSE interfaces.

To use it you must know some of its structure and notation. Here are some details paraphrased from the major `tidyeval`

/`rlang`

client, the package dplyr: `vignette('programming', package = 'dplyr')`

).

- "
`:=`

" is needed to make left-hand-side re-mapping possible (adding yet another "more than one assignment type operator running around" notation issue). - "
`!!`

" substitution requires parenthesis to safely bind (so the notation is actually "`(!! )`

", not "`!!`

"). - Left-hand-sides of expressions are names or strings, while right-hand-sides are
`quosures`

/expressions.

Continue reading Non-Standard Evaluation and Function Composition in R

Here is an absolutely *horrible* way to confuse yourself and get an inflated reported `R-squared`

on a simple linear regression model in `R`

.

We have written about this before, but we found a new twist on the problem (interactions with categorical variable encoding) which we would like to call out here. Continue reading An easy way to accidentally inflate reported R-squared in linear regression models

This note describes a useful `replyr`

tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses).

Continue reading Use a Join Controller to Document Your Work