Posted on Categories Coding, Computer Science, data science, Opinion, Programming, Statistics, TutorialsTags , , , , 11 Comments on Base R can be Fast

Base R can be Fast

“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are (edit: “always”) faster than R code.”

The benchmark results of “rquery: Fast Data Manipulation in R” really called out for follow-up timing experiments. This note is one such set of experiments, this time concentrating on in-memory (non-database) solutions.

Below is a graph summarizing our new results for a number of in-memory implementations, a range of data sizes, and two different machine types.

Unnamed chunk 2 1

Continue reading Base R can be Fast

Posted on Categories Computers, Programming, Statistics, TutorialsTags , , 5 Comments on Setting up RStudio Server quickly on Amazon EC2

Setting up RStudio Server quickly on Amazon EC2

I have recently been working on projects using Amazon EC2 (elastic compute cloud), and RStudio Server. I thought I would share some of my working notes.

Amazon EC2 supplies near instant access to on-demand disposable computing in a variety of sizes (billed in hours). RStudio Server supplies an interactive user interface to your remote R environment that is nearly indistinguishable from a local RStudio console. The idea is: for a few dollars you can work interactively on R tasks requiring hundreds of GB of memory and tens of CPUs and GPUs.

If you are already an Amazon EC2 user with some Unix experience it is very easy to quickly stand up a powerful R environment, which is what I will demonstrate in this note.

Continue reading Setting up RStudio Server quickly on Amazon EC2

Posted on Categories Computer Science, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, ProgrammingTags , , , , , , 3 Comments on rquery: Fast Data Manipulation in R

rquery: Fast Data Manipulation in R

Win-Vector LLC recently announced the rquery R package, an operator based query generator.

In this note I want to share some exciting and favorable initial rquery benchmark timings.

Continue reading rquery: Fast Data Manipulation in R

Posted on Categories data science, Programming, TutorialsTags , , 1 Comment on New wrapr R pipeline feature: wrapr_applicable

New wrapr R pipeline feature: wrapr_applicable

The R package wrapr now has a neat new feature: “wrapr_applicable”.

Wraprs

This feature allows objects to declare a surrogate function to stand in for the object in wrapr pipelines. It is a powerful technique and allowed us to quickly implement a convenient new ad hoc query mode for rquery.

A small effort in making a package “wrapr aware” appears to have a fairly large payoff.

Posted on Categories Administrativia, Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , , , , Leave a comment on Big cdata News

Big cdata News

I have some big news about our R package cdata. We have greatly improved the calling interface and Nina Zumel has just written the definitive introduction to cdata.

cdata is our general coordinatized data tool. It is what powers the deep learning performance graph (here demonstrated with R and Keras) that I announced a while ago.

KerasPlot

However, cdata is much more than that.

Continue reading Big cdata News