Posted on Categories Opinion, Programming, Rants, StatisticsTags , , Leave a comment on Is dplyr Easily Comprehensible?

Is dplyr Easily Comprehensible?

dplyr is one of the most popular R packages. It is powerful and important. But is it in fact easily comprehensible? Continue reading Is dplyr Easily Comprehensible?

Posted on Categories Administrativia, Opinion, StatisticsTags , , Leave a comment on Thank You For The Very Nice Comment

Thank You For The Very Nice Comment

Somebody nice reached out and gave us this wonderful feedback on our new Supervised Learning in R: Regression (paid) video course.

Thanks for a wonderful course on DataCamp on XGBoost and Random forest. I was struggling with Xgboost earlier and Vtreat has made my life easy now :).

Continue reading Thank You For The Very Nice Comment

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, StatisticsTags , 4 Comments on Supervised Learning in R: Regression

Supervised Learning in R: Regression

We are very excited to announce a new (paid) Win-Vector LLC video training course: Supervised Learning in R: Regression now available on DataCamp

Shield image course 3851 20170725 24872 3f982z Continue reading Supervised Learning in R: Regression

Posted on Categories Opinion, Programming, StatisticsTags , , , , , 9 Comments on Let’s Have Some Sympathy For The Part-time R User

Let’s Have Some Sympathy For The Part-time R User

When I started writing about methods for better "parametric programming" interfaces for dplyr for R dplyr users in December of 2016 I encountered three divisions in the audience:

  • dplyr users who had such a need, and wanted such extensions.
  • dplyr users who did not have such a need ("we always know the column names").
  • dplyr users who found the then-current fairly complex "underscore" and lazyeval system sufficient for the task.

Needing name substitution is a problem an advanced full-time R user can solve on their own. However a part-time R would greatly benefit from a simple, reliable, readable, documented, and comprehensible packaged solution. Continue reading Let’s Have Some Sympathy For The Part-time R User

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , , , , , , , 1 Comment on More documentation for Win-Vector R packages

More documentation for Win-Vector R packages

The Win-Vector public R packages now all have new pkgdown documentation sites! (And, a thank-you to Hadley Wickham for developing the pkgdown tool.)

Please check them out (hint: vtreat is our favorite).

NewImage Continue reading More documentation for Win-Vector R packages

Posted on Categories Coding, data science, Opinion, Programming, Statistics, TutorialsTags , , , , 13 Comments on Tutorial: Using seplyr to Program Over dplyr

Tutorial: Using seplyr to Program Over dplyr

seplyr is an R package that makes it easy to program over dplyr 0.7.*.

To illustrate this we will work an example.

Continue reading Tutorial: Using seplyr to Program Over dplyr

Posted on Categories Administrativia, Exciting Techniques, Statistics, TutorialsTags , , 1 Comment on seplyr update

seplyr update

The development version of my new R package seplyr is performing in practical applications with dplyr 0.7.* much better than even I (the seplyr package author) expected.

I think I have hit a very good set of trade-offs, and I have now spent significant time creating documentation and examples.

I wish there had been such a package weeks ago, and that I had started using this approach in my own client work at that time. If you are already a dplyr user I strongly suggest trying seplyr in your own analysis projects.

Please see here for details.

Posted on Categories data science, Opinion, Programming, Statistics, TutorialsTags , , , , 12 Comments on dplyr 0.7 Made Simpler

dplyr 0.7 Made Simpler

I have been writing a lot (too much) on the R topics dplyr/rlang/tidyeval lately. The reason is: major changes were recently announced. If you are going to use dplyr well and correctly going forward you may need to understand some of the new issues (if you don’t use dplyr you can safely skip all of this). I am trying to work out (publicly) how to best incorporate the new methods into:

  • real world analyses,
  • reusable packages,
  • and teaching materials.

I think some of the apparent discomfort on my part comes from my feeling that dplyr never really gave standard evaluation (SE) a fair chance. In my opinion: dplyr is based strongly on non-standard evaluation (NSE, originally through lazyeval and now through rlang/tidyeval) more by the taste and choice than by actual analyst benefit or need. dplyr isn’t my package, so it isn’t my choice to make; but I can still have an informed opinion, which I will discuss below.

Continue reading dplyr 0.7 Made Simpler

Posted on Categories data science, Statistics, TutorialsTags , , , 10 Comments on Better Grouped Summaries in dplyr

Better Grouped Summaries in dplyr

For R dplyr users one of the promises of the new rlang/tidyeval system is an improved ability to program over dplyr itself. In particular to add new verbs that encapsulate previously compound steps into better self-documenting atomic steps.

Let’s take a look at this capability.

Continue reading Better Grouped Summaries in dplyr