Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks.
If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.
Continue reading Software Dependencies and Risk
I am collecting here some notes on testing in
There seems to be a general (false) impression among non R-core developers that to run tests,
R package developers need a test management system such as
testthat. And a further false impression that
testthat is the only
R test management system. This is in fact not true, as
R itself has a capable testing facility in "
R CMD check" (a command triggering
R checks from outside of any given integrated development environment).
By a combination of skimming the
R-manuals ( https://cran.r-project.org/manuals.html ) and running a few experiments I came up with a description of how
R-testing actually works. And I have adapted the available tools to fit my current preferred workflow. This may not be your preferred workflow, but I have and give my reasons below.
Continue reading Unit Tests in R
Let’s try some "ugly corner cases" for data manipulation in
R. Corner cases are examples where the user might be running to the edge of where the package developer intended their package to work, and thus often where things can go wrong.
Let’s see what happens when we try to stick a fork in the power-outlet.
Continue reading Data Manipulation Corner Cases
Recently ran into something interesting in the
R macros/quasi-quotation/substitution/syntax front:
Romain François: “.@_lionelhenry reveals planned double curly syntax At #satRdayParis as a possible replacement, addition to !! and enquo()”
!! is no longer the last word in substitution (it certainly wasn’t the first).
Continue reading More on Macros in R
Recently Hadley Wickham prescribed pronouncing the
magrittr pipe as “then” and using right-assignment as follows:
I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section.
Continue reading Playing With Pipe Notations
Manning has a new discount code and a free excerpt of our book Practical Data Science with R, 2nd Edition: here.
This section is elementary, but things really pick up speed as later on (also available in a paid preview).
R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.
The user can now control which columns of a
cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.
Continue reading cdata Control Table Keys
To make teaching
R quasi-quotation easier it would be nice if
R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts.
Continue reading Make Teaching R Quasi-Quotation Easier
While working on a variation of the
RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices:
The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant.
This is derived from one of the check equations for the Moore–Penrose inverse and we have details of the derivation here, and details of the messy algebra here.
While developing the
R package I took a little extra time to port the core algorithm from
C++ to both
This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on for a summary of the results.
Continue reading Timing the Same Algorithm in R, Python, and C++