Posted on Categories Coding, Opinion, Programming, Statistics, TutorialsTags , , , Leave a comment on R Tip: Use Slices

R Tip: Use Slices

R tip: use slices.

SliceOMatic

R has a very powerful array slicing ability that allows for some very slick data processing.

Continue reading R Tip: Use Slices

Posted on Categories Opinion, Programming, StatisticsTags , , , 7 Comments on Neglected R Super Functions

Neglected R Super Functions

R has a lot of under-appreciated super powerful functions. I list a few of our favorites below.


6095431665 88664494f0 b

Atlas, carrying the sky. Royal Palace (Paleis op de Dam), Amsterdam.

Photo: Dominik Bartsch, CC some rights reserved.

Continue reading Neglected R Super Functions

Posted on Categories Coding, Programming, TutorialsTags , , , 8 Comments on R Tip: Use drop = FALSE with data.frames

R Tip: Use drop = FALSE with data.frames

Another R tip. Get in the habit of using drop = FALSE when indexing (using [ , ] on) data.frames.

NewImage

Prince Rupert’s drops (img: Wikimedia Commons)

Continue reading R Tip: Use drop = FALSE with data.frames

Posted on Categories Coding, data science, Exciting Techniques, Programming, Statistics, TutorialsTags , , , Leave a comment on Wanted: cdata Test Pilots

Wanted: cdata Test Pilots

I need a few volunteers to please “test pilot” the development version of the R package cdata, please.

Jackie Cochran at 1938 Bendix Race
Jacqueline Cochran: at the time of her death, no other pilot held more speed, distance, or altitude records in aviation history than Cochran.

Continue reading Wanted: cdata Test Pilots

Posted on Categories Coding, Opinion, Programming, Statistics, TutorialsTags , , , , , , 6 Comments on Is R base::subset() really that bad?

Is R base::subset() really that bad?

Is R base::subset() really that bad?

The Hitchhiker s Guide to the Galaxy svg

Continue reading Is R base::subset() really that bad?

Posted on Categories Coding, data science, Programming, StatisticsTags , , , , , , , 12 Comments on Is 10,000 Cells Big?

Is 10,000 Cells Big?

Trick question: is a 10,000 cell numeric data.frame big or small?

In the era of "big data" 10,000 cells is minuscule. Such data could be fit on fewer than 1,000 punched cards (or less than half a box).


Punch card

The joking answer is: it is small when they are selling you the system, but can be considered unfairly large later.

Continue reading Is 10,000 Cells Big?

Posted on Categories Exciting Techniques, Programming, Statistics, TutorialsTags , , , , , 4 Comments on Supercharge your R code with wrapr

Supercharge your R code with wrapr

I would like to demonstrate some helpful wrapr R notation tools that really neaten up your R code.


1968 AMX blown and tubbed e

Img: Christopher Ziemnowicz.

Continue reading Supercharge your R code with wrapr

Posted on Categories Coding, Programming, TutorialsTags , , 3 Comments on Advisory on Multiple Assignment dplyr::mutate() on Databases

Advisory on Multiple Assignment dplyr::mutate() on Databases

I currently advise R dplyr users to take care when using multiple assignment dplyr::mutate() commands on databases.


Unknown

(image: Kingroyos, Creative Commons Attribution-Share Alike 3.0 Unported License)

In this note I exhibit a troublesome example, and a systematic solution.

Continue reading Advisory on Multiple Assignment dplyr::mutate() on Databases

Posted on Categories Coding, Computer Science, data science, Opinion, Programming, Statistics, TutorialsTags , , , , 14 Comments on Base R can be Fast

Base R can be Fast

“Base R” (call it “Pure R”, “Good Old R”, just don’t call it “Old R” or late for dinner) can be fast for in-memory tasks. This is despite the commonly repeated claim that: “packages written in C/C++ are (edit: “always”) faster than R code.”

The benchmark results of “rquery: Fast Data Manipulation in R” really called out for follow-up timing experiments. This note is one such set of experiments, this time concentrating on in-memory (non-database) solutions.

Below is a graph summarizing our new results for a number of in-memory implementations, a range of data sizes, and two different machine types.

Unnamed chunk 2 1 Continue reading Base R can be Fast

Posted on Categories Computers, Programming, Statistics, TutorialsTags , , 6 Comments on Setting up RStudio Server quickly on Amazon EC2

Setting up RStudio Server quickly on Amazon EC2

I have recently been working on projects using Amazon EC2 (elastic compute cloud), and RStudio Server. I thought I would share some of my working notes.

Amazon EC2 supplies near instant access to on-demand disposable computing in a variety of sizes (billed in hours). RStudio Server supplies an interactive user interface to your remote R environment that is nearly indistinguishable from a local RStudio console. The idea is: for a few dollars you can work interactively on R tasks requiring hundreds of GB of memory and tens of CPUs and GPUs.

If you are already an Amazon EC2 user with some Unix experience it is very easy to quickly stand up a powerful R environment, which is what I will demonstrate in this note.

Continue reading Setting up RStudio Server quickly on Amazon EC2