Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , 2 Comments on Use a Join Controller to Document Your Work

Use a Join Controller to Document Your Work

This note describes a useful replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses).

Continue reading Use a Join Controller to Document Your Work

Posted on Categories art, Expository Writing, OpinionTags , , , , , 1 Comment on Visualizing relational joins

Visualizing relational joins

I want to discuss a nice series of figures used to teach relational join semantics in R for Data Science by Garrett Grolemund and Hadley Wickham, O’Reilly 2016. Below is an example from their book illustrating an inner join:

NewImage

Please read on for my discussion of this diagram and teaching joins. Continue reading Visualizing relational joins

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , , , 16 Comments on The Zero Bug

The Zero Bug

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it.


The zero bug

The zero bug

Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” from examples, and this causes problems. Please read on for what this means, the consequences, and how to avoid the problem. Continue reading The Zero Bug