Posted on Categories Opinion, Programming, TutorialsTags 4 Comments on Software Dependencies and Risk

Software Dependencies and Risk

Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks.

If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.

Continue reading Software Dependencies and Risk

Posted on Categories Opinion, Programming, TutorialsTags , , , 2 Comments on Unit Tests in R

Unit Tests in R

I am collecting here some notes on testing in R.

There seems to be a general (false) impression among non R-core developers that to run tests, R package developers need a test management system such as RUnit or testthat. And a further false impression that testthat is the only R test management system. This is in fact not true, as R itself has a capable testing facility in "R CMD check" (a command triggering R checks from outside of any given integrated development environment).

By a combination of skimming the R-manuals ( https://cran.r-project.org/manuals.html ) and running a few experiments I came up with a description of how R-testing actually works. And I have adapted the available tools to fit my current preferred workflow. This may not be your preferred workflow, but I have and give my reasons below.

Continue reading Unit Tests in R

Posted on Categories data science, Opinion, Pragmatic Data Science, Programming, StatisticsTags , , , 3 Comments on Data Manipulation Corner Cases

Data Manipulation Corner Cases

Let’s try some "ugly corner cases" for data manipulation in R. Corner cases are examples where the user might be running to the edge of where the package developer intended their package to work, and thus often where things can go wrong.

Let’s see what happens when we try to stick a fork in the power-outlet.

Fork

Continue reading Data Manipulation Corner Cases

Posted on Categories Coding, Opinion, TutorialsTags , 1 Comment on More on Macros in R

More on Macros in R

Recently ran into something interesting in the R macros/quasi-quotation/substitution/syntax front:

D0FD431X0AI4pM8

Romain François: “.@_lionelhenry reveals planned double curly syntax At #satRdayParis as a possible replacement, addition to !! and enquo()”

It appears !! is no longer the last word in substitution (it certainly wasn’t the first).

Continue reading More on Macros in R

Posted on Categories Coding, OpinionTags , , , ,

Playing With Pipe Notations

Recently Hadley Wickham prescribed pronouncing the magrittr pipe as “then” and using right-assignment as follows:

NewImage

I am not sure if it is a good or bad idea. But let’s play with it a bit, and perhaps readers can submit their experience and opinions in the comments section.

Continue reading Playing With Pipe Notations

Posted on Categories Administrativia, data science, Opinion, StatisticsTags , ,

PDSwR2 Free Excerpt and New Discount Code

Manning has a new discount code and a free excerpt of our book Practical Data Science with R, 2nd Edition: here.

This section is elementary, but things really pick up speed as later on (also available in a paid preview).

Posted on Categories Exciting Techniques, Opinion, TutorialsTags , ,

cdata Control Table Keys

In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.

The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

Continue reading cdata Control Table Keys

Posted on Categories Opinion, Programming, TutorialsTags , , 3 Comments on Make Teaching R Quasi-Quotation Easier

Make Teaching R Quasi-Quotation Easier

To make teaching R quasi-quotation easier it would be nice if R string-interpolation and quasi-quotation both used the same notation. They are related concepts. So some commonality of notation would actually be clarifying, and help teach the concepts. We will define both of the above terms, and demonstrate the relation between the two concepts.

Continue reading Make Teaching R Quasi-Quotation Easier

Posted on Categories Mathematics, Opinion, TutorialsTags , ,

A Beautiful 2 by 2 Matrix Identity

While working on a variation of the RcppDynProg algorithm we derived the following beautiful identity of 2 by 2 real matrices:

The superscript “top” denoting the transpose operation, the ||.||^2_2 denoting sum of squares norm, and the single |.| denoting determinant.

This is derived from one of the check equations for the Moore–Penrose inverse and we have details of the derivation here, and details of the messy algebra here.

Posted on Categories Coding, Opinion, TutorialsTags , , , 7 Comments on Timing the Same Algorithm in R, Python, and C++

Timing the Same Algorithm in R, Python, and C++

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python.

This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on for a summary of the results.

Continue reading Timing the Same Algorithm in R, Python, and C++