Four years ago today authors Nina Zumel and John Mount received our author’s copies of Practical Data Science with R!

Continue reading Four Years of Practical Data Science with R

Skip to content
# Tag: Practical Data Science with R

Posted on Categories Administrativia, data science, Opinion, Practical Data Science, Pragmatic Data Science, StatisticsLeave a comment on Four Years of Practical Data Science with R## Four Years of Practical Data Science with R

Posted on Categories Administrativia, Practical Data Science, Statistics2 Comments on Hangul/Korean edition of Practical Data Science with R!## Hangul/Korean edition of Practical Data Science with R!

Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning## Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

Posted on Categories Administrativia, Expository Writing, Opinion, Practical Data Science, Statistics## Did she know we were writing a book?

Posted on Categories Administrativia, Pragmatic Data Science, Statistics## For a short time: Half Off Some Manning Data Science Books

Posted on Categories Administrativia, Practical Data Science, Statistics1 Comment on Half off Win-Vector data science books and video training!## Half off Win-Vector data science books and video training!

Posted on Categories Practical Data Science, Pragmatic Data Science, Statistics## Practical Data Science with R examples

Posted on Categories Administrativia, Practical Data Science## Thank you Joseph Rickert!

Posted on Categories Administrativia, data science, Practical Data Science, Statistics## The Win-Vector R data science value pack

Posted on Categories Coding, data science, math programming, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics3 Comments on Vtreat: designing a package for variable treatment## Vtreat: designing a package for variable treatment

Four years ago today authors Nina Zumel and John Mount received our author’s copies of Practical Data Science with R!

Continue reading Four Years of Practical Data Science with R

Excited to see our new Hangul/Korean edition of “Practical Data Science with R” by Nina Zumel, John Mount, translated by Daekyoung Lim.

Continue reading Hangul/Korean edition of Practical Data Science with R!

We have updated the errata for Practical Data Science with R to reflect that it is no longer worth the effort to use the Java version of SQLScrewdriver as described.

We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Also, database adapters for R have greatly improved, so we feel more confident depending on them alone. Practical Data Science with R remains an excellent book and a good resource to learn from that we are very proud of and fully support (hence errata). Continue reading Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article

Writing a book is a sacrifice. It takes a lot of time, represents a lot of missed opportunities, and does not (directly) pay very well. If you do a good job it may pay back in good-will, but producing a serious book is a great challenge.

Nina Zumel and I definitely troubled over possibilities for some time before deciding to write *Practical Data Science with R*, Nina Zumel, John Mount, Manning 2014.

In the end we worked very hard to organize and share a lot of good material in what we feel is a very readable manner. But I think the first-author may have been signaling and preparing a bit earlier than I was aware we were writing a book. Please read on to see some of her prefiguring work. Continue reading Did she know we were writing a book?

Our publisher Manning Publications is celebrating the release of a new data science in Python title *Introducing Data Science* by offering it and other Manning titles at half off until Wednesday, May 18.

As part of the promotion you can also use the supplied discount code `mlcielenlt`

for half off some R titles including *R in Action*, Second Edition and our own *Practical Data Science with R*. Combine these with our half off code (`C3`

) for our R video course Introduction to Data Science and you can get a lot of top quality data science material at a deep discount.

We are pleased to announce our book Practical Data Science with R (Nina Zumel, John Mount, Manning 2014) is part of Manning’s “Deal of the Day” of April 9th 2016. This one day only offer gets you half off for physical book (with free e-copy) or paid e-copy (e-copy simultaneous pdf + ePub + kindle, and DRM free!).

Here is the discount count in Tweetable form (please Tweet/share!):

Deal of the Day April 9: Half off my book Practical Data Science with R. Use code

`dotd040916au`

at https://www.manning.com/books/practical-data-science-with-r

In celebration of this we are offering our video instruction course Introduction to Data Science (Nina Zumel, John Mount 2015) is also half off with “code `C3`

” (https://www.udemy.com/introduction-to-data-science/?couponCode=C3).

One of the big points of Practical Data Science with R is to supply a large number of fully worked examples. Our intent has always been for readers to read the book, and if they wanted to follow up on a data set or technique to find the matching worked examples in the project directory of our book support materials git repository.

Some readers want to work much closer to the sequence in the book. To make working along with book easier we extracted all book examples and shared them with our readers (in a Github directory, and a downloadable zip file, press “Raw” to download). The direct extraction from the book guarantees the files are in sync with our revised book. However there are trade-offs, sometimes (for legibility) the book mixed input and output without using R’s comment conventions. So you can’t always just paste everything. Also for a snippet to run you may need some libraries, data and results of previous snippets to be present in your R environment.

To help these readers we have added a new section to the book support materials: knitr markdown sheets that work all the book extracts from each chapter. Each chapter and appendix now has a matching markdown file that sets up the correct context to run each and every snippet extracted from the book. In principle you can now clone the entire zmPDSwR repository to your local machine and run all the from the CodeExamples directory by using the RStudio project in RunExamples. Correct execution also depens on having the right packages installed so we have also added a worksheet showing everything we expect to see installed in one place: InstallAll.Rmd (note some of the packages require external dependencies to work such as a C compiler, curl libraries, and a Java framework to run).

A bit of text we are proud to steal from our good friend Joseph Rickert:

Then, for some very readable background material on SVMs I recommend section 13.4 of

Applied Predictive Modelingand sections 9.3 and 9.4 ofPractical Data Science with Rby Nina Zumel and John Mount. You will be hard pressed to find an introduction to kernel methods and SVMs that is as clear and useful as this last reference.

For more on SVMs see the original article on the Revolution Analytics blog.

Win-Vector LLC is proud to announce the R data science value pack. 50% off our video course *Introduction to Data Science* (available at Udemy) and 30% off *Practical Data Science with R* (from Manning). Pick any combination of video, e-book, and/or print-book you want. Instructions below.

Please share and Tweet! Continue reading The Win-Vector R data science value pack

When you apply machine learning algorithms on a regular basis, on a wide variety of data sets, you find that certain data issues come up again and again:

- Missing values (
`NA`

or blanks) - Problematic numerical values (
`Inf`

,`NaN`

, sentinel values like 999999999 or -1) - Valid categorical levels that don’t appear in the training data (especially when there are rare levels, or a large number of levels)
- Invalid values

Of course, you should examine the data to understand the nature of the data issues: are the missing values missing at random, or are they systematic? What are the valid ranges for the numerical data? Are there sentinel values, what are they, and what do they mean? What are the valid values for text fields? Do we know all the valid values for a categorical variable, and are there any missing? Is there any principled way to roll up category levels? In the end though, the steps you take to deal with these issues will often be the same from data set to data set, so having a package of ready-to-go functions for data treatment is useful. In this article, we will discuss some of our usual data treatment procedures, and describe a prototype R package that implements them.

Continue reading Vtreat: designing a package for variable treatment