Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags , , , Leave a comment on Data engineering and data shaping in Practical Data Science with R 2nd Edition

Data engineering and data shaping in Practical Data Science with R 2nd Edition

A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site.

Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a real . It reduced the calculation time for a scenario in modelStudio from 30 minutes to 7 minutes. Following the advice in your book is helping me a lot with understanding R and the models you can create with R: Thanks

This is exactly what we were hoping for when we added Chapter 5 Data engineering and data shaping to the 2nd edition of the book. The chapter is organized by data manipulation task (what you are trying to do, or your sub-goal) and then teaches the mere methodology in base-R, data.table, and dplyr. The hope was: a Rosetta Stone of data manipulation solutions, that would help many readers- and not lock them into any one notation.

Posted on Categories Administrativia, Opinion, TutorialsTags , , Leave a comment on General Data Science Means Cross-Language Tools, Training, and Documentation

General Data Science Means Cross-Language Tools, Training, and Documentation

Data science is often a case of brining the tools to the problems and data, instead of insisting on bringing the problems and data to the tools.

To support cross-language data science we have been working on cross-language tools, documentation, and training.

Continue reading General Data Science Means Cross-Language Tools, Training, and Documentation

Posted on Categories AdministrativiaTags , Leave a comment on Deal of the Day May 10: Half off Practical Data Science with R, Second Edition

Deal of the Day May 10: Half off Practical Data Science with R, Second Edition

Deal of the Day May 10: Half off Practical Data Science with R, Second Edition. Use code dotd051020au at https://bit.ly/2xLRPCk

PDSwR2Lego 1

Posted on Categories Administrativia, Practical Data ScienceTags , Leave a comment on Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

Nina and John Speaking at Why R? Webinar Thursday, May 7, 2020

Nina Zumel and John Mount will be speaking on advanced data preparation for supervised machine learning at the Why R? Webinar Thursday, May 7, 2020.

UntitledImage

This is a 8pm in a GMT+2 timezone, which for us is 11AM Pacific Time. Hope to see you there!

Posted on Categories Computer Science, StatisticsTags , , 1 Comment on Some Applications of The Spicy Soup Test

Some Applications of The Spicy Soup Test

Here are a few isolation inspired “applications” (in the theoretical or mathematical sense of the term) of the spicy soup combinatorial design.

Continue reading Some Applications of The Spicy Soup Test

Posted on Categories data science, Mathematics, Statistics, TutorialsTags , , , , , 2 Comments on Imputing Out of Mixtures, or Un-Stirring Spicy Soup

Imputing Out of Mixtures, or Un-Stirring Spicy Soup

Here is a fun combinatorial puzzle. I’ve probably seen this used to teach before, but let’s try to define or work this one from memory. I would love to hear more solutions/analyses of this problem.

Suppose you have n kettles of soup labeled 0 through n-1. For our problem we assume that k kettles of soup are extremely spicy. We want to figure out which kettles contain spicy soup.

71FvGZBxgHL SX679

Image source: Mad Dog 357 / Amazon

This presents an interesting puzzle when k is much smaller than n. We are assuming that spicy is a rare event we want to detect. We are also assuming the spicy soups are so spicy, that they remain spicy even when combined with other soups. So when we prepare mixtures of soups we experience the union of the spiciness of the included soups.

The question is: if we prepare tasting bowls that are mixtures of samples from the kettles- how many bowls do we have to prepare to reliably identify all of the spicy soup kettles? This is hopefully in the spirit of the “counterfeit gold coin puzzle” as seen in the Columbo detective show (though I end up using a bit more math).

Continue reading Imputing Out of Mixtures, or Un-Stirring Spicy Soup

Posted on Categories Administrativia, data science, Practical Data Science, StatisticsTags Leave a comment on Discount on Manning Books, Including our own Practical Data Science with R 2nd Edition

Discount on Manning Books, Including our own Practical Data Science with R 2nd Edition

We have a discount on Manning Books, including our own Practical Data Science with R 2nd Edition!

Continue reading Discount on Manning Books, Including our own Practical Data Science with R 2nd Edition

Posted on Categories Coding, data science, math programming, Statistics, TutorialsTags , , , , , , , , Leave a comment on Y-Conditionally Regularized Neural Nets

Y-Conditionally Regularized Neural Nets

Win Vector LLC’s Dr. Nina Zumel has had great success applying y-aware methods to machine learning problems, and working out the detailed cross-validation methods needed to make y-aware procedures safe. I thought I would try our hand at y-aware neural net or deep learning methods here.

Continue reading Y-Conditionally Regularized Neural Nets

Posted on Categories Programming, TutorialsTags , , 2 Comments on R Tip: How To Look Up Matrix Values Quickly

R Tip: How To Look Up Matrix Values Quickly

R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values.

Of course, sometimes it takes a while to figure out how to do this. Please read for a great R matrix lookup problem and solution.

Continue reading R Tip: How To Look Up Matrix Values Quickly