Free video course: applied Bayesian A/B testing in R

Posted on Categories Administrativia, Pragmatic Data Science, StatisticsTags , , Leave a comment on Free video course: applied Bayesian A/B testing in R

As a “thank you” to our blog, mailing list, and Twitter followers (@WinVectorLLC) we at Win-Vector LLC have decided to re-release our formerly fee-based A/B testing video course as a free (advertisement supported) video course here on Youtube.


The course emphasizes how to design A/B tests using prior “guestimates” of effect sizes (often you have these from prior campaigns, or somebody claims an effect size and it is merely your job to confirm it). It is fairly technical, and the emphasis is Bayesian- where we are trying to get an actual estimate of the distribution unknown true expected payoff rate of the various campaigns (the so-called posteriors). We show how to design and evaluate a sales campaigns for a product at two different price points.

The solution is coded in R and Nina Zumel has contributed an updated Shiny user interface demonstrating the technique (for more on Shiny, please see here). The code for the calculation methods and older shiny app are shared here. Continue reading Free video course: applied Bayesian A/B testing in R

“Introduction to Data Science” video course contest is closed

Posted on Categories Administrativia, CodingTags , , Leave a comment on “Introduction to Data Science” video course contest is closed

Congratulations to all the winners of the Win-Vector “Introduction to Data Science” Video Course giveaway! We’ve emailed all of you your individual subscription coupons. Continue reading “Introduction to Data Science” video course contest is closed

Win-Vector data science mailing list (and a give-away!)

Posted on Categories Administrativia, StatisticsTags 2 Comments on Win-Vector data science mailing list (and a give-away!)

Win-Vector LLC is starting a data science mailing list that we would like you to sign up for. It is going to be a (deliberately infrequent) set of updates including Win-Vector LLC notices, upcoming speaking events, and data science products.

To kick this off we will be awarding 5 free permanent subscriptions to our video course “Introduction to Data Science” to people who join the mailing list in January 2016 (people who have already signed up already eligible!).
(The contest is over, thank you all for entering!).

For more news/announcements please follow us on:

Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016

Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags Leave a comment on Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016

Nina Zumel and I are honored to have been invited to be part of Strata + Hadoop World in San Jose 2016 R Day organized by RStudio and O’Reilly. Continue reading Nina Zumel and John Mount part of R Day at Strata + Hadoop World in San Jose 2016

Win-Vector news

Posted on Categories AdministrativiaTags , Leave a comment on Win-Vector news

Just an update of what we have been up to lately at Win-Vector LLC, and a reminder of some of our current offerings. It has been busy lately (and that is good).

Our current professional service offerings continue to be data science consulting (helping companies extract value from their data and data infrastructure) and on-site corporate training. We have been honored to recently deliver our training to teams at Salesforce and Genentech. Continue reading Win-Vector news

Wald’s sequential analysis technique

Posted on Categories AdministrativiaTags , , , Leave a comment on Wald’s sequential analysis technique

Microsoft Revolution Analytics has just posted our latest article on A/B testing: Wald’s graphical sequential inspection procedure. It is a fun appreciation of a really cool procedure and I hope you check it out.

IMG 1692
Figure 14, Section 6.4.2, page 111, Abraham Wald, Sequential Analysis, Dover 2004 (reprinting a 1947 edition).

Upcoming Win-Vector Appearances

Posted on Categories Administrativia, data science, Statistics, TutorialsTags , 1 Comment on Upcoming Win-Vector Appearances

We have two public appearances coming up in the next few weeks:

Workshop at ODSC, San Francisco – November 14

Both of us will be giving a two-hour workshop called Preparing Data for Analysis using R: Basic through Advanced Techniques. We will cover key issues in this important but often neglected aspect of data science, what can go wrong, and how to fix it. This is part of the Open Data Science Conference (ODSC) at the Marriot Waterfront in Burlingame, California, November 14-15. If you are attending this conference, we look forward to seeing you there!

You can find an abstract for the workshop, along with links to software and code you can download ahead of time, here.

An Introduction to Differential Privacy as Applied to Machine Learning: Women in ML/DS – December 2

I (Nina) will give a talk to the Bay Area Women in Machine Learning & Data Science Meetup group, on applying differential privacy for reusable hold-out sets in machine learning. The talk will also cover the use of differential privacy in effects coding (what we’ve been calling “impact coding”) to reduce the bias that can arise from the use of nested models. Information about the talk, and the meetup group, can be found here.

We’re looking forward to these upcoming appearances, and we hope you can make one or both of them.

Our Differential Privacy Mini-series

Posted on Categories Administrativia, Computer Science, data science, Exciting Techniques, Statistics, UncategorizedTags , , , ,

We’ve just finished off a series of articles on some recent research results applying differential privacy to improve machine learning. Some of these results are pretty technical, so we thought it was worth working through concrete examples. And some of the original results are locked behind academic journal paywalls, so we’ve tried to touch on the highlights of the papers, and to play around with variations of our own.

Blurry snowflakes stock by cosmicgallifrey d3inho1

  • A Simpler Explanation of Differential Privacy: Quick explanation of epsilon-differential privacy, and an introduction to an algorithm for safely reusing holdout data, recently published in Science (Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth, “The reusable holdout: Preserving validity in adaptive data analysis”, Science, vol 349, no. 6248, pp. 636-638, August 2015).

    Note that Cynthia Dwork is one of the inventors of differential privacy, originally used in the analysis of sensitive information.

  • Using differential privacy to reuse training data: Specifically, how differential privacy helps you build efficient encodings of categorical variables with many levels from your training data without introducing undue bias into downstream modeling.
  • A simple differentially private-ish procedure: The bootstrap as an alternative to Laplace noise to introduce privacy.

Our R code and experiments are available on Github here, so you can try some experiments and variations yourself.

Image Credit

Thank you Joseph Rickert!

Posted on Categories Administrativia, Practical Data ScienceTags , , , ,

A bit of text we are proud to steal from our good friend Joseph Rickert:

Then, for some very readable background material on SVMs I recommend section 13.4 of Applied Predictive Modeling and sections 9.3 and 9.4 of Practical Data Science with R by Nina Zumel and John Mount. You will be hard pressed to find an introduction to kernel methods and SVMs that is as clear and useful as this last reference.

For more on SVMs see the original article on the Revolution Analytics blog.

Some key Win-Vector serial data science articles

Posted on Categories Administrativia, Practical Data Science, StatisticsTags , ,

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence.


What not everybody may have noticed is a number of these articles are serialized into series for deeper comprehension. The key series include:

  • Statistics to English translation.

    This series tries to find vibrant applications and explanations of standard good statistical practices, to make them more approachable to the non statistician.

  • Statistics as it should be.

    This series tries to cover cutting edge machine learning techniques, and then adapt and explain them in traditional statistical terms.

  • R as it is.

    This series tries to teach the statistical programming language R “warts and all” so we can see it as the versatile and powerful data science tool that it is.

To get a taste of what we are up to in our writing please checkout our blog highlights and these series. For deeper treatments of more operational topics also check out our book Practical Data Science with R.

Or if you have something particular you need solved consider engaging us at Win-Vector LLC for data science consulting and/or training.