Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , , , ,

Data re-Shaping in R and in Python

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial.

This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and expect to eventually work with both (if you are lucky).

Continue reading Data re-Shaping in R and in Python

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , 2 Comments on wrapr 1.9.6 is now up on CRAN

wrapr 1.9.6 is now up on CRAN

wrapr 1.9.6 is now up on CRAN.

We unfortunately usually forget to say this. A big thank you to the staff and volunteers at CRAN.

Continue reading wrapr 1.9.6 is now up on CRAN

Posted on Categories Administrativia, art, OpinionTags ,

Off topic: Horror Translations by Nina Zumel

In an off-topic post we would like to share a series of horror narrations based on Win Vector LLC’s very own Nina Zumel’s translations of Uruguayan author Horacio Quiroga. This is a free series produced by Rue Morgue

The first is: “The Feather Pillow.” DO NOT LISTEN TO THIS IN BED!

(YouTube link, Rue Morge link, Ephemera link)

More of Nina’s literary work can be found at: Ephemera Experiments in Writing, and Multo (Ghost).

Posted on Categories Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags , , , ,

Why we wrote wrapr to/unpack

One reason we are developing the wrapr to/unpack methods is the following: we wanted to spruce up the R vtreat interface a bit.

Continue reading Why we wrote wrapr to/unpack

Posted on Categories data science, Statistics, TutorialsTags , , 2 Comments on Using unpack to Manage Your R Environment

Using unpack to Manage Your R Environment

In our last note we stated that unpack is a good tool for load R RDS files into your working environment. Here is the idea expanded into a worked example.

Continue reading Using unpack to Manage Your R Environment

Posted on Categories Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , ,

sklearn Pipe Step Interface for vtreat

We’ve been experimenting with this for a while, and the next R vtreat package will have a back-port of the Python vtreat package sklearn pipe step interface (in addition to the standard R interface).

Continue reading sklearn Pipe Step Interface for vtreat

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , ,

New vtreat Feature: Nested Model Bias Warning

For quite a while we have been teaching estimating variable re-encodings on the exact same data they are later naively using to train a model on, leads to an undesirable nested model bias. The vtreat package (both the R version and Python version) both incorporate a cross-frame method that allows one to use all the training data both to build learn variable re-encodings and to correctly train a subsequent model (for an example please see our recent PyData LA talk).

The next version of vtreat will warn the user if they have improperly used the same data for both vtreat impact code inference and downstream modeling. So in addition to us warning you not to do this, the package now also checks and warns against this situation. vtreat has had methods for avoiding nested model bias for vary long time, we are now adding new warnings to confirm users are using them.

Set up the Example

This example is excerpted from some of our classification documentation.

Continue reading New vtreat Feature: Nested Model Bias Warning

Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , , , , ,

New Year’s Resolution 2020: Work on more R Data Science Projects

We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription to the course for less than $21 just visit this link https://www.udemy.com/course/introduction-to-data-science/ and use the discount code ITDS21 any time in January of 2020.

Combine this with the new second edition of Practical Data Science with R, and you have a great study set to succeed at substantial statistical modeling and analytics tasks using the R programming language.


PDSwR2Lego

(Note: Lego mini-fig not included!)

Posted on Categories Administrativia, data science, Practical Data ScienceTags

Manning Deal of the Day January 3, 2020 : Half off Practical Data Science with R, Second Edition

Manning Deal of the Day January 3, 2020 : Half off Practical Data Science with R, Second Edition. Use code dotd010320au at http://bit.ly/39vD1G4

Please share!