Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

Posted on Categories Administrativia, Exciting Techniques, Expository Writing, Statistics, TutorialsTags , , , Leave a comment on Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

Short form:

Win-Vector LLC’s Dr. Nina Zumel has a three part series on Principal Components Regression that we think is well worth your time.

  • Part 1: the proper preparation of data (including scaling) and use of principal components analysis (particularly for supervised learning or regression).
  • Part 2: the introduction of y-aware scaling to direct the principal components analysis to preserve variation correlated with the outcome we are trying to predict.
  • Part 3: how to pick the number of components to retain for analysis.

Continue reading Why you should read Nina Zumel’s 3 part series on principal components analysis and regression

Installing WVPlots and “knitting R markdown”

Posted on Categories Administrativia, TutorialsTags Leave a comment on Installing WVPlots and “knitting R markdown”

Some readers have been having a bit of trouble using devtools to install WVPlots (announced here and used to produce some of the graphs shown here). I thought I would write a note with a few instructions to help.

These are things you should not have to do often, and things those of us already running R have stumbled through and forgotten about. These are also the kind of finicky system dependent non-repeatable interactive GUI steps you largely avoid once you have a scriptable system like fully R up and running. Continue reading Installing WVPlots and “knitting R markdown”

For a short time: Half Off Some Manning Data Science Books

Posted on Categories Administrativia, Pragmatic Data Science, StatisticsTags , Leave a comment on For a short time: Half Off Some Manning Data Science Books

Our publisher Manning Publications is celebrating the release of a new data science in Python title Introducing Data Science by offering it and other Manning titles at half off until Wednesday, May 18.

As part of the promotion you can also use the supplied discount code mlcielenlt for half off some R titles including R in Action, Second Edition and our own Practical Data Science with R. Combine these with our half off code (C3) for our R video course Introduction to Data Science and you can get a lot of top quality data science material at a deep discount.

Coming up: principal components analysis

Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , 2 Comments on Coming up: principal components analysis

Just a “heads-up.”

I’ve been editing a two-part three-part series Nina Zumel is writing on some of the pitfalls of improperly applied principal components analysis/regression and how to avoid them (we are using the plural spelling as used in following Everitt The Cambridge Dictionary of Statistics). The series is looking absolutely fantastic and I think it will really help people understand, properly use, and even teach the concepts.

The series includes fully worked graphical examples in R and is why we added the ScatterHistN plot to WVPlots (plot shown below, explained in the upcoming series).

s

Frankly the material would have worked great as an additional chapter for Practical Data Science with R (but instead everybody is going to get it for free).

Please watch here for the series.
The complete series is now up:

Improved vtreat documentation

Posted on Categories Administrativia, Statistics, TutorialsTags , , Leave a comment on Improved vtreat documentation

Nina Zumel has donated some time to greatly improve the vtreat R package documentation (now available as pre-rendered HTML here).

Chrome Vanadium Adjustable Wrench

vtreat is an R data.frame processor/conditioner package that helps prepare real-world data for predictive modeling in a statistically sound manner. Continue reading Improved vtreat documentation

Half off Win-Vector data science books and video training!

Posted on Categories Administrativia, Practical Data Science, StatisticsTags , , 1 Comment on Half off Win-Vector data science books and video training!

We are pleased to announce our book Practical Data Science with R (Nina Zumel, John Mount, Manning 2014) is part of Manning’s “Deal of the Day” of April 9th 2016. This one day only offer gets you half off for physical book (with free e-copy) or paid e-copy (e-copy simultaneous pdf + ePub + kindle, and DRM free!).

Here is the discount count in Tweetable form (please Tweet/share!):

Deal of the Day April 9: Half off my book Practical Data Science with R. Use code dotd040916au at https://www.manning.com/books/practical-data-science-with-r

In celebration of this we are offering our video instruction course Introduction to Data Science (Nina Zumel, John Mount 2015) is also half off with “code C3” (https://www.udemy.com/introduction-to-data-science/?couponCode=C3).

Upcoming Win-Vector LLC appearances

Posted on Categories Administrativia, Statistics, TutorialsTags ,

Win-Vector LLC will be presenting on statistically validating models using R and data science at:

We will share code and examples.

Registration required (and Strata is a paid conference). Please Tweet/forward. We hope to see you soon!

NewImage NewImage

More on preparing data

Posted on Categories Administrativia, Opinion, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , ,

The Microsoft Data Science User Group just sponsored Dr. Nina Zumel‘s presentation “Preparing Data for Analysis Using R”. Microsoft saw Win-Vector LLC‘s ODSC West 2015 presentation “Prepping Data for Analysis using R” and generously offered to sponsor improving it and disseminating it to a wider audience.



Logo

We feel Nina really hit the ball out of the park with over 400 new live viewers. Read more for links to even more free materials! Continue reading More on preparing data

Win-Vector video courses: price/status changes

Posted on Categories Administrativia, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , , 3 Comments on Win-Vector video courses: price/status changes

Win-Vector LLC has been offering a couple of online video courses on the topics of data science and A/B testing (both using R). These are high quality courses and well worth the money and time needed to work through them closely (with all materials distributed on GitHub).

Our current distributor is Udemy, which has just announced a unilateral change in pricing policy (March 2, 2016). This note is about the current status of these courses. Continue reading Win-Vector video courses: price/status changes

More Shiny user showcase demonstrations

Posted on Categories Administrativia, data science, Programming, StatisticsTags ,

We at Win-Vector LLC are very proud to announce that RStudio just inducted two more of our demonstration Shiny applications into their Shiny User Showcase gallery. Continue reading More Shiny user showcase demonstrations