My basic video review of the PyCharm integrated development environment for Python with Anaconda and Jupyter/iPython integration. I like the IDE extensions enough to pay for them early in my evaluation. Highly recommended for data science projects, at least try one of the open-source or the trial versions.
Actually, Python has a large number of very capable integrated development environments, some of which are specifically tailored for data science. Please read on for a small list of tools, and my recommendations for a specific data science in Python toolchain.
Nina and I have been sending out drafts of our book Practical Data Science with R 2nd Edition for technical review. A few of the reviews came back from reviewers that described themselves with variations of:
Senior Business Analyst for COMPANYNAME. I have been involved in presenting graphs of data for many years.
To us this reads as somebody with deep experience, confidence, and bit of humility. They do something technical and valuable, but because they understand it they do not consider it to be arcane magic.
In this note we describe might can happen if such a person (or if a junior version of such a person) acquires 1 or 2 technical books.
A rigorous hands-on introduction to data science for software engineers.
Win Vector LLC is now offering a 4 day on-site intensive data science course. The course targets software engineers familiar with Python and introduces them to the basics of current data science practice. This is designed as an interactive in-person (not remote or video) course.
The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.
My BARUGrquery talk went very well, thank you very much to the attendees for being an attentive and generous audience.
(John teaching rquery at BARUG, photo credit: Timothy Liu)
I am now looking for invitations to give a streamlined version of this talk privately to groups using R who want to work with SQL (with databases such as PostgreSQL or big data systems such as Apache Spark). rquery has a number of features that greatly improve team productivity in this environment (strong separation of concerns, strong error checking, high usability, specific debugging features, and high performance queries).
If your group is in the San Francisco Bay Area and using R to work with a SQL accessible data source, please reach out to me at firstname.lastname@example.org, I would be honored to show your team how to speed up their project and lower development costs with rquery. If you are a big data vendor and some of your clients use R, I am especially interested in getting in touch: our system can help R users start working with your installation.
Many data scientists (and even statisticians) often suffer under one of the following misapprehensions:
They believe a technique doesn’t work in their current situation (when in fact it does), leading to useless precautions and missed opportunities.
They believe a technique does work in their current situation (when in fact it does not), leading to failed experiments or incorrect results.
I feel this happens less often if you are working with observable and composable tools of the proper scale. Somewhere between monolithic all in one systems, and ad-hoc one-off coding is a cognitive sweet spot where great work can be done.