We have two public appearances coming up in the next few weeks:
Workshop at ODSC, San Francisco – November 14
Both of us will be giving a two-hour workshop called Preparing Data for Analysis using R: Basic through Advanced Techniques. We will cover key issues in this important but often neglected aspect of data science, what can go wrong, and how to fix it. This is part of the Open Data Science Conference (ODSC) at the Marriot Waterfront in Burlingame, California, November 14-15. If you are attending this conference, we look forward to seeing you there!
You can find an abstract for the workshop, along with links to software and code you can download ahead of time, here.
An Introduction to Differential Privacy as Applied to Machine Learning: Women in ML/DS – December 2
I (Nina) will give a talk to the Bay Area Women in Machine Learning & Data Science Meetup group, on applying differential privacy for reusable hold-out sets in machine learning. The talk will also cover the use of differential privacy in effects coding (what we’ve been calling “impact coding”) to reduce the bias that can arise from the use of nested models. Information about the talk, and the meetup group, can be found here.
We’re looking forward to these upcoming appearances, and we hope you can make one or both of them.
We’ve just finished off a series of articles on some recent research results applying differential privacy to improve machine learning. Some of these results are pretty technical, so we thought it was worth working through concrete examples. And some of the original results are locked behind academic journal paywalls, so we’ve tried to touch on the highlights of the papers, and to play around with variations of our own.
Our R code and experiments are available on Github here, so you can try some experiments and variations yourself.
A bit of text we are proud to steal from our good friend Joseph Rickert:
Then, for some very readable background material on SVMs I recommend section 13.4 of Applied Predictive Modeling and sections 9.3 and 9.4 of Practical Data Science with R by Nina Zumel and John Mount. You will be hard pressed to find an introduction to kernel methods and SVMs that is as clear and useful as this last reference.
For more on SVMs see the original article on the Revolution Analytics blog.
As readers have surely noticed the Win-Vector LLC blog isn’t a stream of short notes, but instead a collection of long technical articles. It is the only way we can properly treat topics of consequence.
What not everybody may have noticed is a number of these articles are serialized into series for deeper comprehension. The key series include:
- Statistics to English translation.
This series tries to find vibrant applications and explanations of standard good statistical practices, to make them more approachable to the non statistician.
- Statistics as it should be.
This series tries to cover cutting edge machine learning techniques, and then adapt and explain them in traditional statistical terms.
- R as it is.
This series tries to teach the statistical programming language R “warts and all” so we can see it as the versatile and powerful data science tool that it is.
To get a taste of what we are up to in our writing please checkout our blog highlights and these series. For deeper treatments of more operational topics also check out our book Practical Data Science with R.
Or if you have something particular you need solved consider engaging us at Win-Vector LLC for data science consulting and/or training.
The Win-Vector blog is provided free of charge (and free of outside advertising) by the researchers at Win-Vector LLC in their spare time. We have been using WordPress for a long time for the blog, and have just now upgraded our corporate site to use WordPress for content management.
So please checkout our site. Excuse our dust, we are moving a few things around.
If you need any data science and/or R consulting or training (or know somebody who might) please reach out to us: email@example.com.
Remember these awful signs all over the web?
Win-Vector LLC is a consultancy founded in 2007 that specializes in research, algorithms, data-science, and training. (The name is an attempt at a mathematical pun.)
Win-Vector LLC can complete your high value project quickly (some examples), and train your data science team to work much more effectively. Our consultants include the authors of Practical Data Science with R and also the video course Introduction to Data Science. We now offer on site custom master classes in data science and R.
Please reach out to us at firstname.lastname@example.org for research, consulting, or training.
Follow us on (Twitter @WinVectorLLC), and sharpen your skills by following our technical blog (link, RSS).
Win-Vector LLC is proud to announce the R data science value pack. 50% off our video course Introduction to Data Science (available at Udemy) and 30% off Practical Data Science with R (from Manning). Pick any combination of video, e-book, and/or print-book you want. Instructions below.
Please share and Tweet! Continue reading The Win-Vector R data science value pack
Win-Vector LLC’s Nina Zumel and John Mount are proud to announce their new data science video course Introduction to Data Science is now available on Udemy.
Continue reading Announcing: Introduction to Data Science video course