Posted on Categories Administrativia, data science, Opinion, Practical Data Science, StatisticsTags , , , , Leave a comment on John Mount speaking on rquery and rqdatatable

John Mount speaking on rquery and rqdatatable

rquery and rqdatatable are new R packages for data wrangling; either at scale (in databases, or big data systems such as Apache Spark), or in-memory. The packages speed up both execution (through optimizations) and development (though a good mental model and up-front error checking) for data wrangling tasks.


Rquery
Rqdatatable

Win-Vector LLC‘s John Mount will be speaking on the rquery and rqdatatable packages at the The East Bay R Language Beginners Group Tuesday, August 7, 2018 (Oakland, CA).

Continue reading John Mount speaking on rquery and rqdatatable

Posted on Categories Administrativia, data science, Statistics, TutorialsTags , 3 Comments on Upcoming Talks

Upcoming Talks

I (Nina Zumel) will be speaking at the Women who Code Silicon Valley meetup on Thursday, October 27.

The talk is called Improving Prediction using Nested Models and Simulated Out-of-Sample Data.

In this talk I will discuss nested predictive models. These are models that predict an outcome or dependent variable (called y) using additional submodels that have also been built with knowledge of y. Practical applications of nested models include “the wisdom of crowds”, prediction markets, variable re-encoding, ensemble learning, stacked learning, and superlearners.

Nested models can improve prediction performance relative to single models, but they introduce a number of undesirable biases and operational issues, and when they are improperly used, are statistically unsound. However modern practitioners have made effective, correct use of these techniques. In my talk I will give concrete examples of nested models, how they can fail, and how to fix failures. The solutions we will discuss include advanced data partitioning, simulated out-of-sample data, and ideas from differential privacy. The theme of the talk is that with proper techniques, these powerful methods can be safely used.

John Mount and I will also be giving a workshop called A Unified View of Model Evaluation at ODSC West 2016 on November 4 (the premium workshop sessions), and November 5 (the general workshop sessions).

We will present a unified framework for predictive model construction and evaluation. Using this perspective we will work through crucial issues from classical statistical methodology, large data treatment, variable selection, ensemble methods, and all the way through stacking/super-learning. We will present R code demonstrating principled techniques for preparing data, scoring models, estimating model reliability, and producing decisive visualizations. In this workshop we will share example data, methods, graphics, and code.

I’m looking forward to these talks, and I hope some of you will be able to attend.

Posted on Categories Administrativia, Statistics, TutorialsTags ,

Upcoming Win-Vector LLC appearances

Win-Vector LLC will be presenting on statistically validating models using R and data science at:

We will share code and examples.

Registration required (and Strata is a paid conference). Please Tweet/forward. We hope to see you soon!

NewImage NewImage

Posted on Categories Administrativia, data science, Practical Data ScienceTags , 1 Comment on Win-Vector LLC’s John Mount at Strata + Hadoop World October 2014

Win-Vector LLC’s John Mount at Strata + Hadoop World October 2014

Win-Vector LLC‘s John Mount will be speaking at Strata + Hadoop World 2014 this month. Please attend my panel on data inventories (a key driver of data science project success) and attend my “Practical Data Science with R” book office hour (get your book signed!). Thank you both O’Reilly Media, Inc. and Waterline Data Science for making this possible.

IMG 0396

Current schedule/location details after the click. Continue reading Win-Vector LLC’s John Mount at Strata + Hadoop World October 2014