Posted on Categories Administrativia, Statistics, TutorialsTags , , Leave a comment on New screencast: using R and RStudio to install and experiment with Apache Spark

New screencast: using R and RStudio to install and experiment with Apache Spark

I have new short screencast up: using R and RStudio to install and experiment with Apache Spark.

More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.

Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , , , , Leave a comment on Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

We have updated the errata for Practical Data Science with R to reflect that it is no longer worth the effort to use the Java version of SQLScrewdriver as described.

Screwdriver

We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Also, database adapters for R have greatly improved, so we feel more confident depending on them alone. Practical Data Science with R remains an excellent book and a good resource to learn from that we are very proud of and fully support (hence errata). Continue reading Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

Posted on Categories Administrativia, StatisticsTags , , Leave a comment on Some Win-Vector R packages

Some Win-Vector R packages

This post concludes our mini-series of Win-Vector open source R packages. We end with WVPlots, a collection of ready-made ggplot2 plots we find handy.

IMG 6061

Please read on for list of some of the Win-Vector LLC open-source R packages that we are pleased to share. Continue reading Some Win-Vector R packages

Posted on Categories Programming, StatisticsTags , , , Leave a comment on sigr: Simple Significance Reporting

sigr: Simple Significance Reporting

sigr is a simple R package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use.

Sigr Continue reading sigr: Simple Significance Reporting

Posted on Categories Exciting Techniques, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , , , Leave a comment on Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

In this screencast we demonstrate how to easily and effectively step-debug magrittr/dplyr pipelines in R using wrapr and replyr.



Continue reading Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

Posted on Categories Programming, StatisticsTags , , , Leave a comment on replyr: Get a Grip on Big Data in R

replyr: Get a Grip on Big Data in R

replyr is an R package that contains extensions, adaptions, and work-arounds to make remote R dplyr data sources (including big data systems such as Spark) behave more like local data. This allows the analyst to more easily develop and debug procedures that simultaneously work on a variety of data services (in-memory data.frame, SQLite, PostgreSQL, and Spark2 currently being the primary supported platforms).

Replyrs Continue reading replyr: Get a Grip on Big Data in R

Posted on Categories Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , Leave a comment on vtreat: prepare data

vtreat: prepare data

This article is on preparing data for modeling in R using vtreat.

Vtreat Continue reading vtreat: prepare data

Posted on Categories Coding, Opinion, StatisticsTags , , , , , , , , , 7 Comments on wrapr: for sweet R code

wrapr: for sweet R code

This article is on writing sweet R code using the wrapr package.


Wrapr
Continue reading wrapr: for sweet R code

Posted on Categories Opinion, Programming, TutorialsTags , , , 1 Comment on Iteration and closures in R

Iteration and closures in R

I recently read an interesting thread on unexpected behavior in R when creating a list of functions in a loop or iteration. The issue is solved, but I am going to take the liberty to try and re-state and slow down the discussion of the problem (and fix) for clarity.

The issue is: are references or values captured during iteration?

Many users expect values to be captured. Most programming language implementations capture variables or references (leading to strange aliasing issues). It is confusing (especially in R, which pushes so far in the direction of value oriented semantics) and best demonstrated with concrete examples.


NewImage

Please read on for a some of the history and future of this issue. Continue reading Iteration and closures in R

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , , , 16 Comments on The Zero Bug

The Zero Bug

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it.


The zero bug

The zero bug

Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” from examples, and this causes problems. Please read on for what this means, the consequences, and how to avoid the problem. Continue reading The Zero Bug