Just an update of what we have been up to lately at Win-Vector LLC, and a reminder of some of our current offerings. It has been busy lately (and that is good).
Our current professional service offerings continue to be data science consulting (helping companies extract value from their data and data infrastructure) and on-site corporate training. We have been honored to recently deliver our training to teams at Salesforce and Genentech.
In blogging we have found people really respond positively to articles in series. Along those lines we have been writing more and organizing more into series. Some recent examples include:
Our differential privacy mini-series:
- A Simpler Explanation of Differential Privacy: Quick explanation of epsilon-differential privacy, and an introduction to an algorithm for safely reusing holdout data, recently published in Science (Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth, “The reusable holdout: Preserving validity in adaptive data analysis”, Science, vol 349, no. 6248, pp. 636-638, August 2015).
- Using differential privacy to reuse training data: Specifically, how differential privacy helps you build efficient encodings of categorical variables with many levels from your training data without introducing undue bias into downstream modeling.
- A simple differentially private-ish procedure: The bootstrap as an alternative to Laplace noise to introduce privacy.
Our model validation series:
Our A/B testing series:
- Wald’s graphical sequential inspection procedure
- A dynamic programming solution to A/B test design
- Why does designing a simple A/B test seem so complicated?
- A clear picture of power and significance in A/B tests
- Bandit Formulations for A/B Tests: Some Intuition
- Bayesian/loss-oriented: New video course: Campaign Response Testing
Out major series:
- Statistics to English translation. This series tries to find vibrant applications and explanations of standard good statistical practices, to make them more approachable to the non statistician.
- Statistics as it should be. This series tries to cover cutting edge machine learning techniques, and then adapt and explain them in traditional statistical terms.
- R as it is. This series tries to teach the statistical programming language R “warts and all” so we can see it as the versatile and powerful data science tool that it is.
Our (pay) data science series:
Some free materials related to the above include:
- Our free gradient boosting lecture.
- All examples from Practical Data Science with R (both projects oriented, and code-snippet oriented views).
- All examples (code/data) from Introduction to Data Science, organized per lecture.
Our plan is more consulting, courses, more on-site training, more conferences, and much more writing. If you want to work with us on any of these, please get in touch!