Posted on Categories Programming, TutorialsTags , , , , , Leave a comment on Piping into ggplot2

Piping into ggplot2

In our wrapr pipe RJournal article we used piping into ggplot2 layers/geoms/items as an example.

Being able to use the same pipe operator for data processing steps and for ggplot2 layering is a question that comes up from time to time (for example: Why can’t ggplot2 use %>%?). In fact the primary ggplot2 package author wishes that magrittr piping was the composing notation for ggplot2 (though it is obviously too late to change).

There are some fundamental difficulties in trying to use the magrittr pipe in such a way. In particular magrittr looks for its own pipe by name in un-evaluated code, and thus is difficult to engineer over (though it can be hacked around). The general concept is: pipe stages are usually functions or function calls, and ggplot2 components are objects (verbs versus nouns); and at first these seem incompatible.

However, the wrapr dot-arrow-pipe was designed to handle such distinctions.

Let’s work an example.

Continue reading Piping into ggplot2

Posted on Categories Opinion, TutorialsTags , , Leave a comment on Some R Guides: tidyverse and data.table Versions

Some R Guides: tidyverse and data.table Versions

Saghir Bashir of ilustat recently shared a nice getting started with R and tidyverse guide.

NewImage

In addition they were generous enough to link to Dirk Eddelbuette’s later adaption of the guide to use data.table.

NewImage

This type of cooperation and user choice is what keeps the R community vital. Please encourage it. (Heck, please insist on it!)

Posted on Categories Exciting Techniques, Practical Data Science, Pragmatic Data Science, Statistics, TutorialsTags , , 1 Comment on Quick Significance Calculations for A/B Tests in R

Quick Significance Calculations for A/B Tests in R

Introduction

Let’s take a quick look at a very important and common experimental problem: checking if the difference in success rates of two Binomial experiments is statistically significant. This can arise in A/B testing situations such as online advertising, sales, and manufacturing.

We already share a free video course on a Bayesian treatment of planning and evaluating A/B tests (including a free Shiny application). Let’s now take a look at the should be simple task of simply building a summary statistic that includes a classic frequentist significance.

Continue reading Quick Significance Calculations for A/B Tests in R

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , Leave a comment on Modeling muti-category Outcomes With vtreat

Modeling muti-category Outcomes With vtreat

vtreat is a powerful R package for preparing messy real-world data for machine learning. We have further extended the package with a number of features including rquery/rqdatatable integration (allowing vtreat application at scale on Apache Spark or data.table!).

In addition vtreat and can now effectively prepare data for multi-class classification or multinomial modeling.

Continue reading Modeling muti-category Outcomes With vtreat

Posted on Categories Opinion, Programming, TutorialsTags , 5 Comments on A Subtle Flaw in Some Popular R NSE Interfaces

A Subtle Flaw in Some Popular R NSE Interfaces

It is no great secret: I like value oriented interfaces that preserve referential transparency. It is the side of the public debate I take in R programming.

"One of the most useful properties of expressions is that called by Quine referential transparency. In essence this means that if we wish to find the value of an expression which contains a sub-expression, the only thing we need to know about the sub-expression is its value."

Christopher Strachey, "Fundamental Concepts in Programming Languages", Higher-Order and Symbolic Computation, 13, 1149, 2000, Kluwer Academic Publishers (lecture notes written by Christopher Strachey for the International Summer School in Computer Programming at Copenhagen in August, 1967).

Please read on for discussion of a subtle bug shared by a few popular non-standard evaluation interfaces.

Continue reading A Subtle Flaw in Some Popular R NSE Interfaces

Posted on Categories Opinion, Programming, TutorialsTags , , , Leave a comment on Timing Column Indexing in R

Timing Column Indexing in R

I’ve ended up (almost accidentally) collecting a number of different solutions to the “use a column to choose values from other columns in R” problem.

Please read on for a brief benchmark comparing these methods/solutions.

Continue reading Timing Column Indexing in R

Posted on Categories Coding, data science, Programming, TutorialsTags , , , , 15 Comments on Using a Column as a Column Index

Using a Column as a Column Index

We recently saw a great recurring R question: “how do you use one column to choose a different value for each row?” That is: how do you use a column as an index? Please read on for some idiomatic base R, data.table, and dplyr solutions.

Continue reading Using a Column as a Column Index

Posted on Categories Opinion, Programming, TutorialsTags , , , , 3 Comments on Parameterizing with bquote

Parameterizing with bquote

One thing that is sure to get lost in my long note on macros in R is just how concise and powerful macros are. The problem is macros are concise, but they do a lot for you. So you get bogged down when you explain the joke.

Let’s try to be concise.

Continue reading Parameterizing with bquote

Posted on Categories Programming, TutorialsTags , , , 5 Comments on Announcing wrapr 1.6.2

Announcing wrapr 1.6.2

wrapr 1.6.2 is now up on CRAN. We have some neat new features for R users to try (in addition to many earlier wrapr goodies).



Continue reading Announcing wrapr 1.6.2

Posted on Categories Programming, TutorialsTags 2 Comments on A Quick Appreciation of the R transform Function

A Quick Appreciation of the R transform Function

R users who also use the dplyr package will be able to quickly understand the following code that adds an estimated area column to a data.frame.

suppressPackageStartupMessages(library("dplyr"))

iris %>%
  mutate(
    ., 
    Petal.Area = (pi/4)*Petal.Width*Petal.Length) %>%
  head(.)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708

The notation we used above is the "explicit argument" variation we recommend for readability. What a lot of dplyr users do not seem to know is: base-R already has this functionality. The function is called transform().

To demonstrate this, let’s first detach dplyr to show that we are not using functions from dplyr.

detach("package:dplyr", unload = TRUE)

Now let’s write the equivalent pipeline using exclusively base-R.

iris ->.
   transform(
     ., 
     Petal.Area = (pi/4)*Petal.Width*Petal.Length) ->.
   head(.)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa  0.2199115
## 2          4.9         3.0          1.4         0.2  setosa  0.2199115
## 3          4.7         3.2          1.3         0.2  setosa  0.2042035
## 4          4.6         3.1          1.5         0.2  setosa  0.2356194
## 5          5.0         3.6          1.4         0.2  setosa  0.2199115
## 6          5.4         3.9          1.7         0.4  setosa  0.5340708

The "->." notation is the end-of-line variation of the Bizarro Pipe. The transform() function has been part of R since 1998. dplyr::mutate() was introduced in 2014.

git log --all -p --reverse --source -S 'transform <-'

commit 41c2f7338c45dbf9eac99c210206bc3657bca98a refs/remotes/origin/tags/R-0-62-4
Author: pd <pd@00db46b3-68df-0310-9c12-caf00c1e9a41>
Date:   Wed Feb 11 18:31:12 1998 +0000

    Added the frametools functions subset() and transform()
    
    git-svn-id: https://svn.r-project.org/R/trunk@709 00db46b3-68df-0310-9c12-caf00c1e9a41