Posted on Categories Programming, TutorialsTags , , 1 Comment on An R function return and assignment puzzle

An R function return and assignment puzzle

Here is an R programming puzzle. What does the following code snippet actually do? And ever harder: what does it mean? (See here for some material on the difference between what code does and what code means.)

f <- function() { x <- 5 }

In R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" the code appears to call the function f() and return nothing (nothing is printed). When teaching I often state that you should explicitly use a non-assignment expression as your return value. You should write code such as the following:

f <- function() { x <- 5; x }
## [1] 5

(We are showing an R output as being prefixed with ##.)

But take a look at the this:

f <- function() { x <- 5 }
## [1] 5

It prints! Read further for what is really going on.

NewImage Continue reading An R function return and assignment puzzle

Posted on Categories AdministrativiaTags ,

Win-Vector news

Just an update of what we have been up to lately at Win-Vector LLC, and a reminder of some of our current offerings. It has been busy lately (and that is good).

Our current professional service offerings continue to be data science consulting (helping companies extract value from their data and data infrastructure) and on-site corporate training. We have been honored to recently deliver our training to teams at Salesforce and Genentech. Continue reading Win-Vector news

Posted on Categories Practical Data Science, Pragmatic Data Science, StatisticsTags , , ,

Practical Data Science with R examples

One of the big points of Practical Data Science with R is to supply a large number of fully worked examples. Our intent has always been for readers to read the book, and if they wanted to follow up on a data set or technique to find the matching worked examples in the project directory of our book support materials git repository.

Some readers want to work much closer to the sequence in the book. To make working along with book easier we extracted all book examples and shared them with our readers (in a Github directory, and a downloadable zip file, press “Raw” to download). The direct extraction from the book guarantees the files are in sync with our revised book. However there are trade-offs, sometimes (for legibility) the book mixed input and output without using R’s comment conventions. So you can’t always just paste everything. Also for a snippet to run you may need some libraries, data and results of previous snippets to be present in your R environment.

To help these readers we have added a new section to the book support materials: knitr markdown sheets that work all the book extracts from each chapter. Each chapter and appendix now has a matching markdown file that sets up the correct context to run each and every snippet extracted from the book. In principle you can now clone the entire zmPDSwR repository to your local machine and run all the from the CodeExamples directory by using the RStudio project in RunExamples. Correct execution also depens on having the right packages installed so we have also added a worksheet showing everything we expect to see installed in one place: InstallAll.Rmd (note some of the packages require external dependencies to work such as a C compiler, curl libraries, and a Java framework to run).

Posted on Categories math programming, Mathematics, StatisticsTags , , ,

Sequential Analysis

We here at Win-Vector LLC been working through an ad-hoc series about A/B testing combining elements of both operations research and statistical points of view.

Our most recent article was a dynamic programming solution to the A/B test problem. Explicitly solving such dynamic programs gets long and tedious, so you are well served by finding and introducing clever invariants to track (something better than just raw win-rates). That clever idea is called “sequential analysis” and was introduced by Abraham Wald (somebody we have written about before). If you have ever heard of a test plan such as “first process to get more than 30 wins ahead of the other is the one we choose” you have seen methods derived from Wald’s sequential analysis technique.

Wald’s famous airplane armor problem

A corrected version of the detailed article is now here.

Posted on Categories Opinion, StatisticsTags , , , 3 Comments on What was data science before it was called data science?

What was data science before it was called data science?

“Data Science” is obviously a trendy term making it way through the hype cycle. Either nobody is good enough to be a data scientist (unicorns) or everybody is too good to be a data scientist (or the truth is somewhere in the middle).


Gartner hype cycle (Wikipedia).

And there is a quarter that grumbles that we are merely talking about statistics under a new name (see here and here).

It has always been the case that advances in data engineering (such as punch cards, or data centers) make analysis practical at new scales (though I still suspect Map/Reduce was a plot designed to trick engineers into being excited about ETL and report generation).


Data Science 1832: Semen Korsakov card.

However, in the 1940s and 1950s the field was called “operations research” (even when performed by statisticians). When you read John F. Magee, (2002) “Operations Research at Arthur D. Little, Inc.: The Early Years”, Operations Research 50(1):149-153 you really come away with the impression you are reading about a study of online advertising performed in the 1940s (okay mail advertising, but mail was “the email of its time”).

In this spirit next week we will write about the sequential analysis solution for A/B-testing, invented in the 1940s by one of the greats of statistics and operations research: Abraham Wald (whom we have written about before).


Abraham Wald