Posted on Categories Administrativia, Programming, StatisticsTags ,

Binning Data in a Database

Roz King just wrote an interesting article on binning data (a common data analytics step) in a database. They compare a case-based approach (where the bin divisions are stuffed into code) with a join based approach. They share code and timings.

Best of all: rquery gets some attention and turns out to be the dominant solution at all scales measured.

Here is an example timing (lower times better):


So please check the article out.

Posted on Categories Exciting Techniques, TutorialsTags , , , 1 Comment on “If You Were an R Function, What Function Would You Be?”

“If You Were an R Function, What Function Would You Be?”

We’ve been getting some good uptake on our piping in R article announcement.

The article is necessarily a bit technical. But one of its key points comes from the observation that piping into names is a special opportunity to give general objects the following personality quiz: “If you were an R function, what function would you be?”

Continue reading “If You Were an R Function, What Function Would You Be?”

Posted on Categories AdministrativiaTags , ,

R Journal Volume 10/2, December 2018 is out!

We forgot to say: R Journal Volume 10/2, December 2018 is out!


A huge thanks to the editors who work very hard to make this possible.

And big “thank you” to the editors, referees, and journal for helping improve, and for including our note on pipes in R.

Posted on Categories Coding, Opinion, TutorialsTags , 1 Comment on More on Macros in R

More on Macros in R

Recently ran into something interesting in the R macros/quasi-quotation/substitution/syntax front:


Romain Fran├žois: “.@_lionelhenry reveals planned double curly syntax At #satRdayParis as a possible replacement, addition to !! and enquo()”

It appears !! is no longer the last word in substitution (it certainly wasn’t the first).

Continue reading More on Macros in R

Posted on Categories Coding, TutorialsTags , , 6 Comments on Getting Started With rquery

Getting Started With rquery

To make getting started with rquery (an advanced query generator for R) easier we have re-worked the package README for various data-sources (including SparkR!).

Continue reading Getting Started With rquery

Posted on Categories Coding, OpinionTags , , , ,

Playing With Pipe Notations

Recently Hadley Wickham prescribed pronouncing the magrittr pipe as “then” and using right-assignment as follows:


I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section.

Continue reading Playing With Pipe Notations

Posted on Categories data science, Exciting Techniques, TutorialsTags ,

Query Generation in R

R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use.

Continue reading Query Generation in R

Posted on Categories Administrativia, data science, Opinion, StatisticsTags , ,

PDSwR2 Free Excerpt and New Discount Code

Manning has a new discount code and a free excerpt of our book Practical Data Science with R, 2nd Edition: here.

This section is elementary, but things really pick up speed as later on (also available in a paid preview).

Posted on Categories Exciting Techniques, Opinion, TutorialsTags , ,

cdata Control Table Keys

In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.

The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

Continue reading cdata Control Table Keys

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags 5 Comments on PDSwR2: New Chapters!

PDSwR2: New Chapters!

We have two new chapters of Practical Data Science with R, Second Edition online and available for review!


The newly available chapters cover:

Data Engineering And Data Shaping – Explores how to use R to organize or wrangle data into a shape useful for analysis. The chapter covers applying data transforms, data manipulation packages, and more.

Choosing and Evaluating Models – The chapter starts with exploring machine learning approaches and then moves to studying key model evaluation topics like mapping business problems to machine learning tasks, evaluating model quality, and how to explain model predictions.

If you haven’t signed up for our book’s MEAP (Manning Early Access Program), we encourage you to do so. The MEAP includes a free copy of Practical Data Science with R, First Edition, as well as early access to chapter drafts of the second edition as we complete them.

For those of you who have already subscribed — thank you! We hope you enjoy the new chapters, and we look forward to your feedback.