There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of reproducible research is an attempt to raise this standard). Production systems have to be durable: they have to remain correct as models, data, packages, users, and environments change over time.
Demonstration systems need merely glow in bright light among friends; production systems must be correct, even alone in the dark.
“Character is what you are in the dark.”
I have found: to deliver production worthy data science and predictive analytic systems, one has to develop per-team and per-project field tested recommendations and best practices. This is necessary even when, or especially when, these procedures differ from official doctrine.
dplyrusers who had such a need, and wanted such extensions.
dplyrusers who did not have such a need ("we always know the column names").
dplyrusers who found the then-current fairly complex "underscore" and
lazyevalsystem sufficient for the task.
Needing name substitution is a problem an advanced full-time
R user can solve on their own. However a part-time
R would greatly benefit from a simple, reliable, readable, documented, and comprehensible packaged solution. Continue reading Let’s Have Some Sympathy For The Part-time R User