Posted on Categories data science, Mathematics, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, Statistics To English Translation, TutorialsTags , , , Leave a comment on Monitoring for Changes in Distribution with Resampling Tests

Monitoring for Changes in Distribution with Resampling Tests

A client recently came to us with a question: what’s a good way to monitor data or model output for changes? That is, how can you tell if new data is distributed differently from previous data, or if the distribution of scores returned by a model have changed? This client, like many others who have faced the same problem, simply checked whether the mean and standard deviation of the data had changed more than some amount, where the threshold value they checked against was selected in a more or less ad-hoc manner. But they were curious whether there was some other, perhaps more principled way, to check for a change in distribution.

Continue reading Monitoring for Changes in Distribution with Resampling Tests

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics To English Translation, TutorialsTags , , Leave a comment on What is New For vtreat 1.5.2?

What is New For vtreat 1.5.2?

vtreat version 1.5.2 just became available from CRAN.

We have a logged a few improvement in the NEWS. The changes are small and incremental, as the package is already in a great stable state for production use.

Continue reading What is New For vtreat 1.5.2?

Posted on Categories data science, Statistics, TutorialsTags , , , , , Leave a comment on New improved cdata instructional video

New improved cdata instructional video

We have a new improved version of the “how to design a cdata/data_algebra data transform” up!

The original article, the Python example, and the R example have all been updated to use the new video.

Please check it out!

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , Leave a comment on New Data Scientist Stickers

New Data Scientist Stickers

We have a new data scientist sticker!

IMG 1007

If you see Nina or John at a conference/MeetUp, please ask us for a sticker!

Posted on Categories AdministrativiaTags , Leave a comment on wrapr Update: Removing Some Under-Used Functions and Classes

wrapr Update: Removing Some Under-Used Functions and Classes

For the next version of the R package wrapr we are going to be removing a number of under-used functions/methods and classes. This update will likely happen in March 2020, and is the start of the wrapr 2.* series.

Most of the items being removed are different abstractions for helping with function composition. We ended up moving most of our work to category-theory based composition, so don’t think these various frameworks are needed any longer. If you have been using these items in your own projects, please reach out and we try and find a way to help you out.

Continue reading wrapr Update: Removing Some Under-Used Functions and Classes

Posted on Categories TutorialsTags , Leave a comment on R Tip: Check What Repos You are Using

R Tip: Check What Repos You are Using

In a lot of our R writing we casually say “install from CRAN using install.packages('PKGNAME')” or “update your packages by using update.packages(ask = FALSE, checkBuilt = TRUE) (and answering ‘no’ to all questions about compiling).”

We recently became aware that for some users this isn’t complete advice.

Continue reading R Tip: Check What Repos You are Using

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , , , , Leave a comment on Data re-Shaping in R and in Python

Data re-Shaping in R and in Python

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial.

This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and expect to eventually work with both (if you are lucky).

Continue reading Data re-Shaping in R and in Python

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , 2 Comments on wrapr 1.9.6 is now up on CRAN

wrapr 1.9.6 is now up on CRAN

wrapr 1.9.6 is now up on CRAN.

We unfortunately usually forget to say this. A big thank you to the staff and volunteers at CRAN.

Continue reading wrapr 1.9.6 is now up on CRAN

Posted on Categories Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags , , , , Leave a comment on Why we wrote wrapr to/unpack

Why we wrote wrapr to/unpack

One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit.

Continue reading Why we wrote wrapr to/unpack

Posted on Categories data science, Statistics, TutorialsTags , , 2 Comments on Using unpack to Manage Your R Environment

Using unpack to Manage Your R Environment

In our last note we stated that unpack is a good tool for load R RDS files into your working environment. Here is the idea expanded into a worked example.

Continue reading Using unpack to Manage Your R Environment