Posted on Categories Programming, TutorialsTags , , 4 Comments on Use Pseudo-Aggregators to Add Safety Checks to Your Data-Wrangling Workflow

Use Pseudo-Aggregators to Add Safety Checks to Your Data-Wrangling Workflow

One of the concepts we teach in both Practical Data Science with R and in our theory of data shaping is the importance of identifying the roles of columns in your data.

For example, to think in terms of multi-row records it helps to identify:

  • Which columns are keys (together identify rows or records).
  • Which columns are data/payload (are considered free varying data).
  • Which columns are "derived" (functions of the keys).

In this note we will show how to use some of these ideas to write safer data-wrangling code.

Continue reading Use Pseudo-Aggregators to Add Safety Checks to Your Data-Wrangling Workflow