Posted on Categories Administrativia, Expository Writing, Opinion, Practical Data Science, StatisticsTags , ,

Did she know we were writing a book?

Writing a book is a sacrifice. It takes a lot of time, represents a lot of missed opportunities, and does not (directly) pay very well. If you do a good job it may pay back in good-will, but producing a serious book is a great challenge.

Nina Zumel and I definitely troubled over possibilities for some time before deciding to write Practical Data Science with R, Nina Zumel, John Mount, Manning 2014.

600 387630642

In the end we worked very hard to organize and share a lot of good material in what we feel is a very readable manner. But I think the first-author may have been signaling and preparing a bit earlier than I was aware we were writing a book. Please read on to see some of her prefiguring work. Continue reading Did she know we were writing a book?

Posted on Categories Administrativia, Computer Science, data science, Exciting Techniques, Statistics, UncategorizedTags , , , ,

Our Differential Privacy Mini-series

We’ve just finished off a series of articles on some recent research results applying differential privacy to improve machine learning. Some of these results are pretty technical, so we thought it was worth working through concrete examples. And some of the original results are locked behind academic journal paywalls, so we’ve tried to touch on the highlights of the papers, and to play around with variations of our own.

Blurry snowflakes stock by cosmicgallifrey d3inho1

  • A Simpler Explanation of Differential Privacy: Quick explanation of epsilon-differential privacy, and an introduction to an algorithm for safely reusing holdout data, recently published in Science (Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth, “The reusable holdout: Preserving validity in adaptive data analysis”, Science, vol 349, no. 6248, pp. 636-638, August 2015).

    Note that Cynthia Dwork is one of the inventors of differential privacy, originally used in the analysis of sensitive information.

  • Using differential privacy to reuse training data: Specifically, how differential privacy helps you build efficient encodings of categorical variables with many levels from your training data without introducing undue bias into downstream modeling.
  • A simple differentially private-ish procedure: The bootstrap as an alternative to Laplace noise to introduce privacy.

Our R code and experiments are available on Github here, so you can try some experiments and variations yourself.

Image Credit

Posted on Categories Administrativia, Practical Data Science, StatisticsTags , ,

Some key Win-Vector serial data science articles

As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence.

NewImage

What not everybody may have noticed is a number of these articles are serialized into series for deeper comprehension. The key series include:

  • Statistics to English translation.

    This series tries to find vibrant applications and explanations of standard good statistical practices, to make them more approachable to the non statistician.

  • Statistics as it should be.

    This series tries to cover cutting edge machine learning techniques, and then adapt and explain them in traditional statistical terms.

  • R as it is.

    This series tries to teach the statistical programming language R “warts and all” so we can see it as the versatile and powerful data science tool that it is.

To get a taste of what we are up to in our writing please checkout our blog highlights and these series. For deeper treatments of more operational topics also check out our book Practical Data Science with R.

Or if you have something particular you need solved consider engaging us at Win-Vector LLC for data science consulting and/or training.

Posted on Categories Administrativia, Expository Writing, Public Service Article, TutorialsTags , ,

Great new post by Win-Vector’s Nina Zumel

Win-Vector LLC’s Nina Zumel has a great new article on the issue of taste in design and problem solving: Design, Problem Solving, and Good Taste. I think it is a big issue: how can you expect good work if you can’t even discuss how to tell good from bad?

Unimark Continue reading Great new post by Win-Vector’s Nina Zumel

Posted on Categories Administrativia, art, OpinionTags ,

Diversion: Win-Vector LLC’s Nina Zumel takes time off to publish a literary book review

Win-Vector LLC’s Nina Zumel takes some time off to publish a literary book review: Reading Red Spectres: Russian Gothic Tales.

Hundertwasser domes

Nina Zumel also examines aspects of the supernatural in literature and in folk culture at her blog, multoghost.wordpress.com. She writes about folklore, ghost stories, weird fiction, or anything else that strikes her fancy. Follow her on Twitter @multoghost.

Posted on Categories Opinion, Practical Data ScienceTags 4 Comments on On writing a technical book

On writing a technical book

I have been doing a lot of writing lately (the book, clients, blog, status updates, and the occasional tweet). This has made me acutely aware of how different many of these writing tasks tend to be. Continue reading On writing a technical book