While developing the `RcppDynProg`

`R`

package I took a little extra time to port the core algorithm from `C++`

to both `R`

and `Python`

.

This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on for a summary of the results.

The algorithm in question is the general dynamic programming solution to the “minimum cost partition into intervals” problem. As coded in `C++`

it uses one-time allocation of large tables and then `for`

-loops and index chasing to fill in the dynamic programming table solution. The `C++`

code is given here.

I then directly transliterated (or line-for line translated) this code into `R`

(code here) and `Python`

(code here). Both of these implementations are very direct translations of the `C++`

solution, so they are possibly not what somebody starting in `R`

or `Python`

would design. So really we are coding in an an imperative `C`

style in `R`

and `Python`

. To emphasize the shallowness of the port I deliberately left the semi-colons from the `C++`

in the `R`

port. The `Python`

can be taken to be equally “un-Pythonic” (for example, we are using `for`

loops and not list comprehensions).

That being said we now have very similar code to compare in all three languages. We can summarize the timings (details here and here) as follows.

problem | solution language | time in seconds |
---|---|---|

500 point partition into intervals dynamic program | R | 21 |

500 point partition into intervals dynamic program | C++ (from R via Rcpp) | 0.088 |

500 point partition into intervals dynamic program | Python | 39 |

Notice for this example `C++`

is 240 times faster than `R`

, and `R`

is almost twice as fast as `Python`

Neither `R`

nor `Python`

is optimized for the type of index-chasing this dynamic programming solution depends on. So we also took a look at a simpler problem: computing the PRESS statistic, which is easy to vectorize (the preferred way of writing efficient code in `R`

and `Python`

). When we compare all three languages on this problem we see the following.

problem | solution method | time in seconds |
---|---|---|

3,000,000 point PRESS statistic calculation | R scalar code | 3.4 |

3,000,000 point PRESS statistic calculation | Rcpp scalar code | 0.26 |

3,000,000 point PRESS statistic calculation | R vectorized code | 0.35 |

3,000,000 point PRESS statistic calculation | Python vectorized (`numpy)` |
0.21 |

The timing details can be found here and here.

Ignoring the `R`

scalar solution (which is *too* direct a translation from `C++`

to `R`

, but a stepping stone to the `R`

vectorized solution as we discuss here). We see: vectorized `Python`

is now about 1.6 times faster than the vectorized `R`

and even 1.2 times faster than the `C++`

(probably not due to `Rcpp`

, but instead driven by my choice of container class in the `C++`

code).

Obviously different code (and per-language tuning and optimization) will give different results. But the above is consistent with our general experience with `R`

, `Python`

, and `C++`

in production.

In conclusion: `R`

and `Python`

are in fact much slower than `C++`

for direct scalar manipulation (single values, indexes, and pointers). However, `R`

and `Python`

are effective *glue* languages that can be fast when they are orchestrating operations over higher level abstractions (vectors, databases, data frames, Spark, Tensorflow, or Keras).

Did you consider Fortran?

I imagine Fortran would be faster still (with native handling of arrays instead of object containers). The issue is

`Rcpp`

is so convenient for linking with`R`

(it manages so much of the boilerplate).By the way, we don’t actually have a strong opinion on this. Most of our public work has been in R, but Python is the fasted growing part of our current practice.

How is the timing compared to pure compiled c++ code?

I’m definitely getting lots of signals that Python is the DS stack of the future. This is unfortunate in some ways. On the upside the package quality is generally higher. On the downside, a lot of basic stats stuff is missing.

FWIIW was numpy and R linked against the same lapack back end? I think R’s approach to squeezing data over to fortran might also account for the 1.6, which is quite good all things considered.

Hey Scott,

Great to hear from you- long time no see. The lapack question is a good one. I don’t know the answer. I am pretty sure R was the vanilla lapack that comes with CRAN R (so probably not one of the high-performance ones). The Python was whatever lapack Anaconda decided to use, I took no additional steps so that is probably also the vanilla one.

As for R/Python. The majority of our consulting is still R, and we are working on the second edition of our book Practical Data Science with R. However, the Python part of our consulting and training is growing faster. It makes sense- once a general purpose language gets into data science use it is going to do well. Interested to see if Python has as many a big fish in a small pond issues.

Yes, Anaconda def comes with a different shared object for lapack.

I was a huge fan of python and numpy in the 90s when Livermore first put NP together; ported a ton of mathematica and matlab code to it. Now a days though, it feels weirdly wordy and primitive. Probably J is turning me into a programming bum, but I don’t feel this way when fooling around in R or javascript.

I always liked that Moore-Penrose thing; some day I’m going to use CUR decomposition to do something neat.