Posted on Categories Coding, Opinion, TutorialsTags , , , 4 Comments on magrittr and wrapr Pipes in R, an Examination

magrittr and wrapr Pipes in R, an Examination

Let’s consider piping in R both using the magrittr package and using the wrapr package.

Continue reading magrittr and wrapr Pipes in R, an Examination

Posted on Categories Administrativia, data science, Opinion, Practical Data Science, Pragmatic Data Science, StatisticsTags , , , Leave a comment on Four Years of Practical Data Science with R

Four Years of Practical Data Science with R

Four years ago today authors Nina Zumel and John Mount received our author’s copies of Practical Data Science with R!

1960860 10203595069745403 608808262 o

Continue reading Four Years of Practical Data Science with R

Posted on Categories Coding, Opinion, Pragmatic Data Science, Statistics, TutorialsTags , , , , , , , Leave a comment on R Tip: Think in Terms of Values

R Tip: Think in Terms of Values

R tip: first organize your tasks in terms of data, values, and desired transformation of values, not initially in terms of concrete functions or code.

I know I write a lot about coding in R. But it is in the service of supporting statistics, analysis, predictive analytics, and data science.

R without data is like going to the theater to watch the curtain go up and down.

(Adapted from Ben Katchor’s Julius Knipl, Real Estate Photographer: Stories, Little, Brown, and Company, 1996, page 72, “Excursionist Drama 2”.)

Usually you come to R to work with data. If you think and plan in terms of data and values (including introducing more data to control processing) you will usually work in much faster, explainable, and maintainable fashion.

Continue reading R Tip: Think in Terms of Values

Posted on Categories Coding, Opinion, Statistics, TutorialsTags , , , , , , , 1 Comment on R Tip: Use let() to Re-Map Names

R Tip: Use let() to Re-Map Names

Another R tip. Need to replace a name in some R code or make R code re-usable? Use wrapr::let().



Continue reading R Tip: Use let() to Re-Map Names

Posted on Categories Coding, Opinion, Statistics, TutorialsTags , , , , , , , 13 Comments on R Tip: Break up Function Nesting for Legibility

R Tip: Break up Function Nesting for Legibility

There are a number of easy ways to avoid illegible code nesting problems in R.

In this R tip we will expand upon the above statement with a simple example.

Continue reading R Tip: Break up Function Nesting for Legibility

Posted on Categories Coding, OpinionTags , , , , , , 4 Comments on Take Care If Trying the RPostgres Package

Take Care If Trying the RPostgres Package

Take care if trying the new RPostgres database connection package. By default it returns some non-standard types that code developed against other database drivers may not expect, and may not be ready to defend against.


Danger

Danger, Will Robinson!

Continue reading Take Care If Trying the RPostgres Package

Posted on Categories Opinion, Rants, StatisticsTags , 3 Comments on The Many Faces of R

The Many Faces of R

Some days I see R as an eclectic programming language preferred by scientists.

“Programming languages as people.”

PP2

From Leftover Salad (David Marino).

Other days I see it more like the following.

Continue reading The Many Faces of R

Posted on Categories Coding, Opinion, Programming, Statistics, TutorialsTags , , , , , , 6 Comments on Is R base::subset() really that bad?

Is R base::subset() really that bad?

Is R base::subset() really that bad?

The Hitchhiker s Guide to the Galaxy svg

Continue reading Is R base::subset() really that bad?

Posted on Categories data science, Opinion, Statistics, TutorialsTags , , , , ,

We Want to be Playing with a Moderate Number of Powerful Blocks

Many data scientists (and even statisticians) often suffer under one of the following misapprehensions:

  • They believe a technique doesn’t work in their current situation (when in fact it does), leading to useless precautions and missed opportunities.
  • They believe a technique does work in their current situation (when in fact it does not), leading to failed experiments or incorrect results.

I feel this happens less often if you are working with observable and composable tools of the proper scale. Somewhere between monolithic all in one systems, and ad-hoc one-off coding is a cognitive sweet spot where great work can be done.

Continue reading We Want to be Playing with a Moderate Number of Powerful Blocks

Posted on Categories Coding, Computer Science, data science, Opinion, Programming, Statistics, TutorialsTags , , , , 14 Comments on Base R can be Fast

Base R can be Fast

“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are (edit: “always”) faster than R code.”

The benchmark results of “rquery: Fast Data Manipulation in R” really called out for follow-up timing experiments. This note is one such set of experiments, this time concentrating on in-memory (non-database) solutions.

Below is a graph summarizing our new results for a number of in-memory implementations, a range of data sizes, and two different machine types.

Unnamed chunk 2 1 Continue reading Base R can be Fast