Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , , , 1 Comment on A Little Something From Practical Data Science with R Chapter 1

A Little Something From Practical Data Science with R Chapter 1

Here is a small quote from Practical Data Science with R Chapter 1.

It is often too much to ask for the data scientist to become a domain expert. However, in all cases the data scientist must develop strong domain empathy to help define and solve the right problems.

Interested? Please check it out.

Posted on Categories Administrativia, art, OpinionTags , Leave a comment on Off topic: Horror Translations by Nina Zumel

Off topic: Horror Translations by Nina Zumel

In an off-topic post we would like to share a series of horror narrations based on Win Vector LLC’s very own Nina Zumel’s translations of Uruguayan author Horacio Quiroga. This is a free series produced by Rue Morgue

The first is: “The Feather Pillow.” DO NOT LISTEN TO THIS IN BED!

(YouTube link, Rue Morge link, Ephemera link)

More of Nina’s literary work can be found at: Ephemera Experiments in Writing, and Multo (Ghost).

Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags , , , , ,

New Year’s Resolution 2020: Work on more R Data Science Projects

We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription to the course for less than $21 just visit this link https://www.udemy.com/course/introduction-to-data-science/ and use the discount code ITDS21 any time in January of 2020.

Combine this with the new second edition of Practical Data Science with R, and you have a great study set to succeed at substantial statistical modeling and analytics tasks using the R programming language.


PDSwR2Lego

(Note: Lego mini-fig not included!)

Posted on Categories data science, Opinion, Pragmatic Data Science, TutorialsTags , , , , , , , , , 1 Comment on New Timings for a Grouped In-Place Aggregation Task

New Timings for a Grouped In-Place Aggregation Task

I’d like to share some new timings on a grouped in-place aggregation task. A client of mine was seeing some slow performance, so I decided to time a very simple abstraction of one of the steps of their workflow.

Continue reading New Timings for a Grouped In-Place Aggregation Task

Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine LearningTags

Practical Data Science with R 2nd Edition update

We are in the last stages of proofing the galleys/typesetting of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019. So this edition will definitely be out soon!

If you ever wanted to see what Nina Zumel and John Mount are like when we have the help of editors, this book is your chance!

One thing I noticed in working through the galleys: it becomes easy to see why Dr. Nina Zumel is first author.

2/3rds of the book is her work.

Posted on Categories Administrativia, data science, Exciting Techniques, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , ,

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter.

Zumel eacmwasf 02

This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction.

It is funny, but it takes some effort to teach in this way. New data scientists want to dive into the details of model construction first, and statisticians are used to getting model diagnostics as a side-effect of model fitting. However, to compare different modeling approaches one really needs good model evaluation that is independent of the model construction techniques.

This teaching style has worked very well for us both in R and in Python (it is considered one of the merits of our LinkedIn AI Academy course design):

One of the best data science courses I’ve taken. The course focuses on model selection and evaluation which are usually underestimated. Thanks to John Mount, the teacher and the co-authors of Practical Data Science with R. hashtag#AI200

(Note: Nina Zumel, leads on the course design, which is the heavy lifting, John Mount just got tasked to be the one delivering it.)

Zumel, Mount, Practical Data Science with R, 2nd Edition is coming out in print very soon. Here is a discount code to help you get a good deal on the book:

Take 37% off Practical Data Science with R, Second Edition by entering fcczumel3 into the discount code box at checkout at manning.com.

Posted on Categories Administrativia, data science, OpinionTags ,

AI for Engineers

For the last year we (Nina Zumel, and myself: John Mount) have had the honor of teaching the AI200 portion of LinkedIn’s AI Academy.

John Mount at LinkedIn

John Mount at the LinkedIn campus

Nina Zumel designed most of the material, and John Mount has been delivering it and bringing her feedback. We’ve just started our 9th cohort. We adjust the course each time. Our students teach us a lot about how one thinks about data science. We bring that forward to each round of the course.

Roughly the goal is the following.

If every engineer, product manager, and project manager had some hands-on experience with data science and AI (deep neural nets), then they are both more likely to think of using these techniques in their work and of introducing the instrumentation required to have useful data in the first place.

This will have huge downstream benefits for LinkedIn. Our group is thrilled to be a part of this.

We are looking for more companies that want an on-site data science intensive for their teams (either in Python or in R).

Posted on Categories data science, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags ,

How to Prepare Data

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables.

In this note we describe some great tools for working with such data.

Continue reading How to Prepare Data

Posted on Categories Administrativia, Opinion, Practical Data Science, StatisticsTags , , 2 Comments on Practical Data Science with R update

Practical Data Science with R update

Just got the following note from a new reader:

Thank you for writing Practical Data Science with R. It’s challenging for me, but I am learning a lot by following your steps and entering the commands.

Wow, this is exactly what Nina Zumel and I hoped for. We wish we could make everything easy, but an appropriate amount of challenge is required for significant learning and accomplishment.

Of course we try to avoid inessential problems. All of the code examples from the book can be found here (and all the data sets here).

The second edition is coming out very soon. Please check it out.