Posted on Categories Administrativia, Practical Data ScienceTags , Leave a comment on Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

## Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

Nina Zumel and John Mount will be speaking on advanced data preparation for supervised machine learning at the Why R? Webinar Thursday, May 7, 2020.

This is a 8pm in a GMT+2 timezone, which for us is 11AM Pacific Time. Hope to see you there!

Posted on Categories data science, Mathematics, Statistics, Tutorials2 Comments on Imputing Out of Mixtures, or Un-Stirring Spicy Soup

## Imputing Out of Mixtures, or Un-Stirring Spicy Soup

Here is a fun combinatorial puzzle. I’ve probably seen this used to teach before, but let’s try to define or work this one from memory. I would love to hear more solutions/analyses of this problem.

Suppose you have `n` kettles of soup labeled `0` through `n-1`. For our problem we assume that `k` kettles of soup are extremely spicy. We want to figure out which kettles contain spicy soup.

Image source: Mad Dog 357 / Amazon

This presents an interesting puzzle when `k` is much smaller than `n`. We are assuming that spicy is a rare event we want to detect. We are also assuming the spicy soups are so spicy, that they remain spicy even when combined with other soups. So when we prepare mixtures of soups we experience the union of the spiciness of the included soups.

The question is: if we prepare tasting bowls that are mixtures of samples from the kettles- how many bowls do we have to prepare to reliably identify all of the spicy soup kettles? This is hopefully in the spirit of the “counterfeit gold coin puzzle” as seen in the Columbo detective show (though I end up using a bit more math).

Posted on Tags Leave a comment on Discount on Manning Books, Including our own Practical Data Science with R 2nd Edition

## Discount on Manning Books, Including our own Practical Data Science with R 2nd Edition

We have a discount on Manning Books, including our own Practical Data Science with R 2nd Edition!

Posted on Categories Coding, data science, math programming, Statistics, TutorialsLeave a comment on Y-Conditionally Regularized Neural Nets

## Y-Conditionally Regularized Neural Nets

Win Vector LLC’s Dr. Nina Zumel has had great success applying y-aware methods to machine learning problems, and working out the detailed cross-validation methods needed to make y-aware procedures safe. I thought I would try our hand at y-aware neural net or deep learning methods here.