Category Archives: Administrativia

Win-Vector LLC’s John Mount at Strata + Hadoop World October 2014

Win-Vector LLC‘s John Mount will be speaking at Strata + Hadoop World 2014 this month. Please attend my panel on data inventories (a key driver of data science project success) and attend my “Practical Data Science with R” book office hour (get your book signed!). Thank you both O’Reilly Media, Inc. and Waterline Data Science for making this possible.

IMG 0396

Current schedule/location details after the click. Continue reading

What is a win vector?

From time to time we are asked “what is the company name Win-Vector LLC referring to?” It is a cryptic pun trying to be an encoding of “we deliver victory.”

The story is an inside joke referring to something really only funny to one of the founders. But a joke that amuses the teller is always enjoyed by at least one person. Win-Vector LLC’s John Mount had the honor of co-authoring a 1997 paper titled “The Polytope of Win Vectors.” The paper title is obviously mathematical terms in an odd combination. However the telegraphic grammar is coincidentally similar to deliberately ungrammatical gamer slang such as “full of win” and “so much win.”

2829551 full of win

If we treat “win” as a concrete noun (say something you can put in a sack) and “vector” in its non-mathematical sense (as an entity of infectious transmission) we have “Win-Vector LLC is an infectious delivery of victory.” I.e.: we deliver success to our clients. Of course, we have now attempt to explain a weak joke. It is not as grand as “winged victory,” but it does encode a positive company value: Win-Vector LLC delivers successful data science projects and training to clients.

640px Nike of Samothrake Louvre Ma2369 n4
Winged Victory: from Wikipedia

Let’s take this as an opportunity to describe what a win vector is. Continue reading

Diversion: Win-Vector LLC’s Nina Zumel takes time off to publish a literary book review

Win-Vector LLC’s Nina Zumel takes some time off to publish a literary book review: Reading Red Spectres: Russian Gothic Tales.

Hundertwasser domes

Nina Zumel also examines aspects of the supernatural in literature and in folk culture at her blog, She writes about folklore, ghost stories, weird fiction, or anything else that strikes her fancy. Follow her on Twitter @multoghost.

Save 50% on Practical Data Science with R (and other titles) at Manning through May 30, 2014

Manning Publications Inc. is launching an exciting new MEAP: Practical Probabilistic Programming (which we have already subscribed to) by offering a 50% discount on Practical Probabilistic Programming and other titles (including Practical Data Science with R!). To get the discount put the books in your Manning shopping car and then add the promotional code ppplaunch50 (through May 30, 2014) into the coupon code field in the “other” section on towards the bottom of the account form. See below for other Manning books eligible for this generous discount. Continue reading

Save 45% on Practical Data Science with R (expires May 21, 2014)

Please share this generous deal from Manning publications: save 45% on Practical Data Science with R through May 21, 2014. Please tweet, forward and share!

Edit: we are going to try and keep the current best deals on the book at the bottom of the Practical Data Science with R page. So look there for updates (also the book is always available at so you may want to look what the discount there is).

Practical Data Science with R: Release date announced


It took a little longer than we’d hoped, but we did it! Practical Data Science with R will be released on April 2nd (physical version). The eBook version will follow soon after, on April 15th. You can preorder the pBook now on the Manning book page. The physical version comes with a complimentary eBook version (when the eBook is released), in all three formats: PDF, ePub, and Kindle.

If you haven’t yet, order it now!

(softbound 416 pages, black and white; includes access to color PDF, ePub and Kindle when available)

The Statistics behind “Verification by Multiplicity”

There’s a new post up at the blog that looks at the statistics of “verification by multiplicity” — the statistical technique that is behind NASA’s announcement of 715 new planets that have been validated in the data from the Kepler Space Telescope.

We normally don’t write about science here at Win-Vector, but we do sometimes examine the statistics and statistical methods behind scientific announcements and issues. NASA’s new technique is a cute and relatively straightforward (statistically speaking) approach.

From what I understand of the introduction to the paper, there are two ways to determine whether or not a planet candidate is really a planet: the first is to confirm the fact with additional measurements of the target star’s gravitational wobble, or by measurements of the transit times of the apparent planets across the face of the star. Getting sufficient measurements can take time. The other way is to “validate” the planet by showing that it’s highly unlikely that the sighting was a false positive. Specifically, the probability that the signal observed was caused by a planet should be at least 100 times larger than the probability that the signal is a false positive. The validation analysis is a Bayesian approach that considers various mechanisms that produce false positives, determines the probability that these various mechanisms could have produced the signal in question, and compares them to the probability that a planet produced the signal.

The basic idea behind verification by multiplicity is that planets are often clustered in multi-planet star systems, while false positive measurements (mistaken identification of potential planets) occur randomly. Putting this another way: if false positives are random, then they won’t tend to occur together near the same star. So if you observe a star with multiple “planet signals,” it’s unlikely that all the signals are false positives. We can use that observation to quantify how much more likely it is that a star with multiple candidates actually hosts a planet. The resulting probability can be used as an improved prior for the planet model when doing the statistical validation described above.

You can read the rest of the article here.