I am going to come-out and say it: I am emotionally done with 32 bit machines and operating systems. My sympathy for them is at an end.
I know that ARM is still 32 bit, but in that case you get something big back in exchange: the ability to deploy on smartphones and tablets. For PCs and servers 32 bit addressing’s time is long past, yet we still have to code for and regularly run into these machines and operating systems. The time/space savings of 32 bit representations is nothing compared to the loss of capability in sticking with that architecture and the wasted effort in coding around it. My work is largely data analysis in a server environment, and it is just getting ridiculous to not be able to always assume at least a 64 bit machine. Continue reading I am done with 32 bit machines
When people ask me what it means to be a data scientist, I used to answer, “it means you don’t have to hold my hand.” By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation of results (both technical and for the business audience) with a minimal amount of assistance from my clients or their people. Not no assistance, of course, but little enough that I’m not interfering too much with their day-to-day job.
This used to be a key selling point, because people with all the necessary skills used to be relatively rare. This is less true now; data science is a hot new career track. Training courses and academic tracks are popping up all over the place. So there is the question: what should such courses teach? Or more to the heart of the question — what does a data scientist do, and what do they need to know?
I came across a post from Emily Willingham the other day: “Is a PhD required for Good Science Writing?”. As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I saw a similar sentiment echoed once by Lee Gutkind, the founder and editor of the journal Creative Nonfiction. I don’t remember exactly what he wrote, but it was something to the effect that scientists are exactly the wrong people to produce literary, accessible writing about matters scientific.
I don’t agree with Gutkind’s point, but I can see where it comes from. Academic writing has a reputation for being deliberately obscure and prolix, jargonistic. Very few people read journal papers for fun (well, except me, but I’m weird). On the other hand, a science writer with a PhD has been trained for critical thinking, and should have a nose for bullpucky, even outside their field of expertise. This can come in handy when writing about medical research or controversial new scientific findings. Any scientist — any person — is going to hype up their work. It’s the writer’s job to see through that hype.
I’m not a science writer in the sense that Dr. Willingham is. I write statistics and data science articles (blog posts) for non-statisticians. Generally, the audience that I write for is professionally interested in the topic, but aren’t necessarily experts at it. And as a writer, many of my concerns are the same as those of a popular science writer.
I want to cut through the bullpucky. I want you, the reader, to come away understanding something you thought you didn’t — or even couldn’t — understand. I want you, the analyst or data science practitioner, to understand your tools well enough to innovate, not just use them blindly. And if I’m writing about one of my innovations, I want you to understand it well enough to possibly use it, not just be awed at my supposed brilliance.
I don’t do these things perfectly; but in the process of trying, and of reading other writers with similar objectives, I’ve figured out a few things.
A recent run of too many articles on the same topic (exhibits: A, B and C) puts me in a position where I feel the need to explain my motivation. Which itself becomes yet another article related to the original topic. The explanation I offer is: this is the way mathematicians think. To us mathematicians the tension is that there are far too many observable patterns in the world to be attributed to mere chance. So our dilemma is: for which patterns/regularities should we derive some underlying law and which ones are not worth worrying about. Or which conjectures should try to work all the way to proof or counter-example? Continue reading The Mathematician’s Dilemma