It has been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting).
Real trust and understanding of this concept doesn’t fully form until one realizes that rows and columns are inessential implementation details when reasoning about your data. Many algorithms are sensitive to how data is arranged in rows and columns, so there is a need to convert between representations. However, confusing representation with semantics slows down understanding.
In this article we will try to separate representation from semantics. We will advocate for thinking in terms of coordinatized data, and demonstrate advanced data wrangling in R.
I had the privilege of having Peter Wang himself demonstrate datashader for me and answer a few of my questions.
I am so excited about datashader capabilities I literally will not wait for the functionality to be exposed in R through rbokeh. I am going to leave my usual knitr/rmarkdown world and dust off Jupyter Notebook just to use datashader plotting. This is worth trying, even for diehard R users. Continue reading Datashader is a big deal
sigr is a simple R package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use.