Posted on Categories Programming, TutorialsTags , , , Leave a comment on Piping is Method Chaining

Piping is Method Chaining

What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called “method chaining” since the days of Smalltalk (~1972). Let’s take a look at method chaining in Python, in terms of pipe notation.

Continue reading Piping is Method Chaining

Posted on Categories Opinion, ProgrammingTags , , Leave a comment on Why RcppDynProg is Written in C++

Why RcppDynProg is Written in C++

The (matter of opinion) claim:

“When the use of C++ is very limited and easy to avoid, perhaps it is the best option to do that […]”

(source discussed here)

got me thinking: does our own RcppDynProg package actually use C++ in a significant way? Could/should I port it to C? Am I informed enough to use something as complicated as C++ correctly?

Continue reading Why RcppDynProg is Written in C++

Posted on Categories Opinion, Programming, TutorialsTags , , 2 Comments on Standard Evaluation Versus Non-Standard Evaluation in R

Standard Evaluation Versus Non-Standard Evaluation in R

There is a lot of unnecessary worry over “Non Standard Evaluation” (NSE) in R versus “Standard Evaluation” (SE, or standard “variables names refer to values” evaluation). This very author is guilty of over-discussing the issue. But let’s give this yet another try.

The entire difference between NSE and regular evaluation can be summed up in the following simple table (which should be clear after we work some examples).

Tbl

Continue reading Standard Evaluation Versus Non-Standard Evaluation in R

Posted on Categories Pragmatic Data Science, Programming, TutorialsTags , , 5 Comments on Tidyverse users: gather/spread are on the way out

Tidyverse users: gather/spread are on the way out

From https://twitter.com/sharon000/status/1107771331012108288:

NewImage

From https://tidyr.tidyverse.org/dev/articles/pivot.html (text by Hadley Wickham):

For some time, it’s been obvious that there is something fundamentally wrong with the design of spread() and gather(). Many people don’t find the names intuitive and find it hard to remember which direction corresponds to spreading and which to gathering. It also seems surprisingly hard to remember the arguments to these functions, meaning that many people (including me!) have to consult the documentation every time.

There are two important new features inspired by other R packages that have been advancing of reshaping in R:

  • The reshaping operation can be specified with a data frame that describes precisely how metadata stored in column names becomes data variables (and vice versa). This is inspired by the cdata package by John Mount and Nina Zumel. For simple uses of pivot_long() and pivot_wide(), this specification is implicit, but for more complex cases it is useful to make it explicit, and operate on the specification data frame using dplyr and tidyr.
  • pivot_long() can work with multiple value variables that may have different types. This is inspired by the enhanced melt() and dcast() functions provided by the data.table package by Matt Dowle and Arun Srinivasan.

If you want to work in the above way we suggest giving our cdata package a try. We named the functions pivot_to_rowrecs and unpivot_to_blocks. The idea was: by emphasizing the record structure one might eventually internalize what the transforms are doing. On the way to that we have a lot of documentation and tutorials.

Posted on Categories Opinion, Programming, TutorialsTags , , 4 Comments on Quantifying R Package Dependency Risk

Quantifying R Package Dependency Risk

We recently commented on excess package dependencies as representing risk in the R package ecosystem.

The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)?

Continue reading Quantifying R Package Dependency Risk

Posted on Categories Opinion, ProgrammingTags , , ,

wrapr::let()

I would like to once again recommend our readers to our note on wrapr::let(), an R function that can help you eliminate many problematic NSE (non-standard evaluation) interfaces (and their associate problems) from your R programming tasks.

The idea is to imitate the following lambda-calculus idea:


let x be y in z := ( λ x . z ) y

Continue reading wrapr::let()

Posted on Categories Opinion, Programming, TutorialsTags 4 Comments on Software Dependencies and Risk

Software Dependencies and Risk

Dirk Eddelbuettel just shared an important point on software and analyses: dependencies are hard to manage risks.

If your software or research depends on many complex and changing packages, you have no way to establish your work is correct. This is because to establish the correctness of your work, you would need to also establish the correctness of all of the dependencies. This is worse than having non-reproducible research, as your work may have in fact been wrong even the first time.

Continue reading Software Dependencies and Risk

Posted on Categories Opinion, Programming, TutorialsTags , , , 2 Comments on Unit Tests in R

Unit Tests in R

I am collecting here some notes on testing in R.

There seems to be a general (false) impression among non R-core developers that to run tests, R package developers need a test management system such as RUnit or testthat. And a further false impression that testthat is the only R test management system. This is in fact not true, as R itself has a capable testing facility in "R CMD check" (a command triggering R checks from outside of any given integrated development environment).

By a combination of skimming the R-manuals ( https://cran.r-project.org/manuals.html ) and running a few experiments I came up with a description of how R-testing actually works. And I have adapted the available tools to fit my current preferred workflow. This may not be your preferred workflow, but I have and give my reasons below.

Continue reading Unit Tests in R

Posted on Categories data science, Opinion, Pragmatic Data Science, Programming, StatisticsTags , , , 3 Comments on Data Manipulation Corner Cases

Data Manipulation Corner Cases

Let’s try some "ugly corner cases" for data manipulation in R. Corner cases are examples where the user might be running to the edge of where the package developer intended their package to work, and thus often where things can go wrong.

Let’s see what happens when we try to stick a fork in the power-outlet.

Fork

Continue reading Data Manipulation Corner Cases

Posted on Categories data science, Programming, TutorialsTags , , , 1 Comment on rquery Substitution

rquery Substitution

The rquery R package has several places where the user can ask for what they have typed in to be substituted for a name or value stored in a variable.

This becomes important as many of the rquery commands capture column names from un-executed code. So knowing if something is treated as a symbol/name (which will be translated to a data.frame column name or a database column name) or a character/string (which will be translated to a constant) is important.

Continue reading rquery Substitution