We are pleased to release a new free data science video lecture: Debugging R code using R, RStudio and wrapper functions. In this 8 minute video we demonstrate the incredible power of R using wrapper functions to catch errors for later reproduction and debugging. If you haven’t tried these techniques this will really improve your debugging game.
Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed presentation solutions (which we are sharing on Github).
For example the plot below showing both an observed discrete empirical distribution (as stems) and a matching theoretical distribution (as bars) is a built in “one liner.”
A corporate site called NPM decided to remove control of a project called “Kik” from its author and give it to a company that claimed to own the trademark on “Kik.” This isn’t actually how trademark law works or we would see the Coca-Cola Company successfully saying we can’t call certain types of coal “coke” (though it is the sort of world the United States’s “Digital Millennium Copyright Act” assumes).
The author of “Kik” decided since he obviously never had true control of the distribution of his modules distributed through NPM he would attempt to remove them (see here). This is the type of issue you worry about when you think about freedoms instead of mere discounts. We are thinking more about at this as we had to recently “re-sign” an arbitrary altered version of Apple’s software license just to run “git status” on our own code.
Tons of code broke because it is currently more stylish to include dependencies than to write code.
Egg is on a lot of faces when it is revealed one of the modules that is so critical to include is something called “leftpad.”
NPM forcibly re-published some modules to try and mitigate the damage.
The R functions base::sample and base::sample.int are functions that include extra “conveniences” that seem to have no purpose beyond encouraging grave errors. In this note we will outline the problem and a suggested work around. Obviously the R developers are highly skilled people with good intent, and likely have no choice in these matters (due to the need for backwards compatibility). However, that doesn’t mean we can’t take steps to write safer and easier to debug code.
A common complaint from new users of R is: the string processing notation is ugly.
Using paste(,,sep='') to concatenate strings seems clumsy.
You are never sure which regular expression dialect grep()/gsub() are really using.
Remembering the difference between length() and nchar() is initially difficult.
As always things can be improved by using additional libraries (for example: stringr). But this always evokes Python’s “There should be one– and preferably only one –obvious way to do it” or what I call the “rule 42” problem: “if it is the right way, why isn’t it the first way?”
At this moment the King, who had been for some time busily writing in his note-book, cackled out `Silence!' and read out from his book, `Rule Forty-two. All persons more than a mile high to leave the court.'
Everybody looked at Alice.
`I'm not a mile high,' said Alice.
`You are,' said the King.
`Nearly two miles high,' added the Queen.
`Well, I shan't go, at any rate,' said Alice: `besides, that's not a regular rule: you invented it just now.'
`It's the oldest rule in the book,' said the King.