For all our remote learners, we are sharing a free coupon code for our R video course Introduction to Data Science. The code is ITDS2020, and can be used at this URL https://www.udemy.com/course/introduction-to-data-science/?couponCode=ITDS2020 . Please check it out and share it!
Students have asked me if it is better to use the same cross-validation plan in each step of an analysis or to use different ones. Our answer is: unless you are coordinating the many plans in some way (such as 2-way independence or some sort of combinatorial design) it is generally better to use one plan. That way minor information leaks at each stage explore less of the output variations, and don’t combine into worse leaks.
I am now sharing a note that works all of the above as specific examples: “Multiple Split Cross-Validation Data Leak” (a follow-up to our larger article “Cross-Methods are a Leak/Variance Trade-Off”).
We have a new data scientist sticker!
If you see Nina or John at a conference/MeetUp, please ask us for a sticker!
We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription to the course for less than $21 just visit this link https://www.udemy.com/course/introduction-to-data-science/ and use the discount code
ITDS21 any time in January of 2020.
Combine this with the new second edition of Practical Data Science with R, and you have a great study set to succeed at substantial statistical modeling and analytics tasks using the R programming language.
(Note: Lego mini-fig not included!)
Video of our PyData Los Angeles 2019 talk Preparing Messy Real World Data for Supervised Machine Learning is now available. In this talk describe how to use vtreat, a package available in R and in Python, to correctly re-code real world data for supervised machine learning tasks.
Please check it out.
(Slides are also here.)
Practical Data Science with R, 2nd Edition author Dr. Nina Zumel, with a fresh author’s copy of her book!
Nina Zumel has been polishing up new
Python documentation and tutorials. They are coming out so good that I find to be fair to the
R community I must start to back-port this new documentation to
I will use this example to show some of the advantages of
cdata record transform specifications.