Archive for the ‘Administrativia’ Category

Practical Data Science with R: Release date announced

March 25th, 2014 5 comments


It took a little longer than we’d hoped, but we did it! Practical Data Science with R will be released on April 2nd (physical version). The eBook version will follow soon after, on April 15th. You can preorder the pBook now on the Manning book page. The physical version comes with a complimentary eBook version (when the eBook is released), in all three formats: PDF, ePub, and Kindle.

If you haven’t yet, order it now!

(softbound 416 pages, black and white; includes access to color PDF, ePub and Kindle when available)

Some statistics about the book

March 4th, 2014 1 comment

The Statistics behind “Verification by Multiplicity”

March 1st, 2014 No comments

There’s a new post up at the blog that looks at the statistics of “verification by multiplicity” — the statistical technique that is behind NASA’s announcement of 715 new planets that have been validated in the data from the Kepler Space Telescope.

We normally don’t write about science here at Win-Vector, but we do sometimes examine the statistics and statistical methods behind scientific announcements and issues. NASA’s new technique is a cute and relatively straightforward (statistically speaking) approach.

From what I understand of the introduction to the paper, there are two ways to determine whether or not a planet candidate is really a planet: the first is to confirm the fact with additional measurements of the target star’s gravitational wobble, or by measurements of the transit times of the apparent planets across the face of the star. Getting sufficient measurements can take time. The other way is to “validate” the planet by showing that it’s highly unlikely that the sighting was a false positive. Specifically, the probability that the signal observed was caused by a planet should be at least 100 times larger than the probability that the signal is a false positive. The validation analysis is a Bayesian approach that considers various mechanisms that produce false positives, determines the probability that these various mechanisms could have produced the signal in question, and compares them to the probability that a planet produced the signal.

The basic idea behind verification by multiplicity is that planets are often clustered in multi-planet star systems, while false positive measurements (mistaken identification of potential planets) occur randomly. Putting this another way: if false positives are random, then they won’t tend to occur together near the same star. So if you observe a star with multiple “planet signals,” it’s unlikely that all the signals are false positives. We can use that observation to quantify how much more likely it is that a star with multiple candidates actually hosts a planet. The resulting probability can be used as an improved prior for the planet model when doing the statistical validation described above.

You can read the rest of the article here.

One day discount on Practical Data Science with R

February 21st, 2014 1 comment

Please forward and share this discount offer for our upcoming book. Manning Deal of the Day February 22: Half off Practical Data Science with R. Use code dotd022214au at

Big News! Practical Data Science with R is content complete!

December 19th, 2013 3 comments

The last appendix has gone to the editors; the book is now content complete. What a relief!

We are hoping to release the book late in the first quarter of next year. In the meantime, you can still get early drafts of our chapters through Manning’s Early Access program, if you haven’t yet. The link is here.

We look forward to sharing the final version of the book with you next year.

Practical Data Science with R: Manning Deal of the Day November 19th 2013

November 19th, 2013 Comments off

Practical Data Science with R October 2013 update

October 26th, 2013 2 comments

A quick status update on our upcoming book “Practical Data Science with R” by Nina Zumel and John Mount.

We are really happy with how the book is coming out. We were able to cover most everything we hoped to. Part 1 (especially chapter 3) is already being used in courses, and has some very good stuff on how to review data. Part 2 covers the “statistical / machine-learning canon,” and turns out to be a very complete demonstration of what odd steps are needed to move from start to finish for each example in R. Part 3 is going to finish with the important (but neglected) topics of delivering results to production, and building good documentation and presentations. Read more…

Just spoke at The Berkeley R Language Beginner Study Group

September 17th, 2013 2 comments

Just spoke at The Berkeley R Language Beginner Study Group. Great audience, very bright. Here are my slides: StartR.pdf .

Categories: Administrativia, data science Tags:

Speaking at BARUG Wednesday, August 21, 2013

August 19th, 2013 1 comment

I’ll be talking at the “Official” BARUG meeting Wednesday, August 21, 2013. The RSVPs look full (sorry) but I wanted to post a thanks to the organizers for considering me. If things go well I’ll see if I can post the slides later (not sure if that is useful without detailed speakers notes). Read more…

Practical Data Science with R, deal of the day Aug 1 2013

July 31st, 2013 3 comments