Nina and I have been sending out drafts of our book Practical Data Science with R 2nd Edition for technical review. A few of the reviews came back from reviewers that described themselves with variations of:
Senior Business Analyst for COMPANYNAME. I have been involved in presenting graphs of data for many years.
To us this reads as somebody with deep experience, confidence, and bit of humility. They do something technical and valuable, but because they understand it they do not consider it to be arcane magic.
In this note we describe might can happen if such a person (or if a junior version of such a person) acquires 1 or 2 technical books.
A rigorous hands-on introduction to data science for software engineers.
Win Vector LLC is now offering a 4 day on-site intensive data science course. The course targets software engineers familiar with Python and introduces them to the basics of current data science practice. This is designed as an interactive in-person (not remote or video) course.
The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.
My BARUGrquery talk went very well, thank you very much to the attendees for being an attentive and generous audience.
(John teaching rquery at BARUG, photo credit: Timothy Liu)
I am now looking for invitations to give a streamlined version of this talk privately to groups using R who want to work with SQL (with databases such as PostgreSQL or big data systems such as Apache Spark). rquery has a number of features that greatly improve team productivity in this environment (strong separation of concerns, strong error checking, high usability, specific debugging features, and high performance queries).
If your group is in the San Francisco Bay Area and using R to work with a SQL accessible data source, please reach out to me at email@example.com, I would be honored to show your team how to speed up their project and lower development costs with rquery. If you are a big data vendor and some of your clients use R, I am especially interested in getting in touch: our system can help R users start working with your installation.
Many data scientists (and even statisticians) often suffer under one of the following misapprehensions:
They believe a technique doesn’t work in their current situation (when in fact it does), leading to useless precautions and missed opportunities.
They believe a technique does work in their current situation (when in fact it does not), leading to failed experiments or incorrect results.
I feel this happens less often if you are working with observable and composable tools of the proper scale. Somewhere between monolithic all in one systems, and ad-hoc one-off coding is a cognitive sweet spot where great work can be done.
We are pleased to announce a new free e-book from Manning Publications: Exploring Data Science. Exploring Data Science is a collection of five chapters hand picked by John Mount and Nina Zumel, introducing you to various areas in data science and explaining which methodologies work best for each.
We are pleased to release a new free data science video lecture: Debugging R code using R, RStudio and wrapper functions. In this 8 minute video we demonstrate the incredible power of R using wrapper functions to catch errors for later reproduction and debugging. If you haven’t tried these techniques this will really improve your debugging game.