Posted on Categories data science, Expository Writing, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , , , , 1 Comment on Coordinatized Data: A Fluid Data Specification

Coordinatized Data: A Fluid Data Specification

Authors: John Mount and Nina Zumel.

Introduction

It has been our experience when teaching the data wrangling part of data science that students often have difficulty understanding the conversion to and from row-oriented and column-oriented data formats (what is commonly called pivoting and un-pivoting).

Real trust and understanding of this concept doesn’t fully form until one realizes that rows and columns are inessential implementation details when reasoning about your data. Many algorithms are sensitive to how data is arranged in rows and columns, so there is a need to convert between representations. However, confusing representation with semantics slows down understanding.

In this article we will try to separate representation from semantics. We will advocate for thinking in terms of coordinatized data, and demonstrate advanced data wrangling in R.

Continue reading Coordinatized Data: A Fluid Data Specification

Posted on Categories Opinion, Statistics, TutorialsTags , , , , , , ,

Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

This is a note on debugging magrittr pipelines in R using Bizarro Pipe and eager assignment.


Moth
Continue reading Debugging Pipelines in R with Bizarro Pipe and Eager Assignment

Posted on Categories StatisticsTags , , , , , , , , 9 Comments on Datashader is a big deal

Datashader is a big deal

I recently got back from Strata West 2017 (where I ran a very well received workshop on R and Spark). One thing that really stood out for me at the exhibition hall was Bokeh plus datashader from Continuum Analytics.

I had the privilege of having Peter Wang himself demonstrate datashader for me and answer a few of my questions.

I am so excited about datashader capabilities I literally will not wait for the functionality to be exposed in R through rbokeh. I am going to leave my usual knitr/rmarkdown world and dust off Jupyter Notebook just to use datashader plotting. This is worth trying, even for diehard R users. Continue reading Datashader is a big deal

Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags ,

Practical Data Science with R: ACM SIGACT News Book Review and Discount!

Our book Practical Data Science with R has just been reviewed in Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory (ACM SIGACT) News by Dr. Allan M. Miller (U.C. Berkeley)!


NewImage

The book is half off at Manning March 21st 2017 using the following code (please share/Tweet):

Deal of the Day March 21: Half off my book Practical Data Science with R. Use code dotd032117au at https://www.manning.com/dotd

Please read on for links and excerpts from the review. Continue reading Practical Data Science with R: ACM SIGACT News Book Review and Discount!

Posted on Categories Coding, OpinionTags , , , , 1 Comment on Another R [Non-]Standard Evaluation Idea

Another R [Non-]Standard Evaluation Idea

Jonathan Carroll had a an interesting R language idea: to use @-notation to request value substitution in a non-standard evaluation environment (inspired by msyql User-Defined Variables).

He even picked the right image:

PandorasBox Continue reading Another R [Non-]Standard Evaluation Idea

Posted on Categories Administrativia, Statistics, TutorialsTags , ,

New screencast: using R and RStudio to install and experiment with Apache Spark

I have new short screencast up: using R and RStudio to install and experiment with Apache Spark.

More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.

Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , , , ,

Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

We have updated the errata for Practical Data Science with R to reflect that it is no longer worth the effort to use the Java version of SQLScrewdriver as described.

Screwdriver

We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Also, database adapters for R have greatly improved, so we feel more confident depending on them alone. Practical Data Science with R remains an excellent book and a good resource to learn from that we are very proud of and fully support (hence errata). Continue reading Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

Posted on Categories Administrativia, StatisticsTags , ,

Some Win-Vector R packages

This post concludes our mini-series of Win-Vector open source R packages. We end with WVPlots, a collection of ready-made ggplot2 plots we find handy.

IMG 6061

Please read on for list of some of the Win-Vector LLC open-source R packages that we are pleased to share. Continue reading Some Win-Vector R packages

Posted on Categories Programming, StatisticsTags , , ,

sigr: Simple Significance Reporting

sigr is a simple R package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use.

Sigr Continue reading sigr: Simple Significance Reporting

Posted on Categories Exciting Techniques, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , , ,

Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr

In this screencast we demonstrate how to easily and effectively step-debug magrittr/dplyr pipelines in R using wrapr and replyr.



Continue reading Step-Debugging magrittr/dplyr Pipelines in R with wrapr and replyr