Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags , 1 Comment on Free Coupon for our R Video Course: Introduction to Data Science

Free Coupon for our R Video Course: Introduction to Data Science

For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL https://www.udemy.com/course/introduction-to-data-science/?couponCode=ITDS2020 . Please check it out and share it!

Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , , , 1 Comment on A Little Something From Practical Data Science with R Chapter 1

A Little Something From Practical Data Science with R Chapter 1

Here is a small quote from Practical Data Science with R Chapter 1.

It is often too much to ask for the data scientist to become a domain expert. However, in all cases the data scientist must develop strong domain empathy to help define and solve the right problems.

Interested? Please check it out.

Posted on Categories data science, Expository Writing, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , Leave a comment on Use the Same Cross-Plan Between Steps

Use the Same Cross-Plan Between Steps

Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way (such as 2-way independence or some sort of combinatorial design) it is generally better to use one plan. That way minor information leaks at each stage explore less of the output variations, and don’t combine into worse leaks.

I am now sharing a note that works all of the above as specific examples: “Multiple Split Cross-Validation Data Leak” (a follow-up to our larger article “Cross-Methods are a Leak/Variance Trade-Off”).

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , Leave a comment on New Data Scientist Stickers

New Data Scientist Stickers

We have a new data scientist sticker!

IMG 1007

If you see Nina or John at a conference/MeetUp, please ask us for a sticker!

Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , , , , , Leave a comment on New Year’s Resolution 2020: Work on more R Data Science Projects

New Year’s Resolution 2020: Work on more R Data Science Projects

We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription to the course for less than $21 just visit this link https://www.udemy.com/course/introduction-to-data-science/ and use the discount code ITDS21 any time in January of 2020.

Combine this with the new second edition of Practical Data Science with R, and you have a great study set to succeed at substantial statistical modeling and analytics tasks using the R programming language.


PDSwR2Lego

(Note: Lego mini-fig not included!)

Posted on Categories Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , , , 1 Comment on PyData Los Angeles 2019 talk: Preparing Messy Real World Data for Supervised Machine Learning

PyData Los Angeles 2019 talk: Preparing Messy Real World Data for Supervised Machine Learning

Video of our PyData Los Angeles 2019 talk Preparing Messy Real World Data for Supervised Machine Learning is now available. In this talk describe how to use vtreat, a package available in R and in Python, to correctly re-code real world data for supervised machine learning tasks.

Please check it out.

(Slides are also here.)

Posted on Categories Administrativia, data scienceTags , , ,

Nina Zumel and John Mount speaking on vtreat at PyData LA 2019

As we have announced before, we have ported the R version of vtreat to a new Python version of vtreat.

Our latest news is: we are speaking about the Python version at PyData LA 2019 (Thursday 10:50 AM–11:35 AM in Track 2 Room).

Continue reading Nina Zumel and John Mount speaking on vtreat at PyData LA 2019

Posted on Categories Administrativia, data science, Practical Data Science, StatisticsTags , , , , 7 Comments on Practical Data Science with R, 2nd Edition, IS OUT!!!!!!!

Practical Data Science with R, 2nd Edition, IS OUT!!!!!!!

Practical Data Science with R, 2nd Edition author Dr. Nina Zumel, with a fresh author’s copy of her book!

IMG 3384

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , ,

Preparing Data for Supervised Classification

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R.

Continue reading Preparing Data for Supervised Classification

Posted on Categories Practical Data Science, Statistics, TutorialsTags , , , , ,

The Advantages of Record Transform Specifications

Nina Zumel had a really great article on how to prepare a nice Keras performance plot using R.


Keras plot

I will use this example to show some of the advantages of cdata record transform specifications.

Continue reading The Advantages of Record Transform Specifications