Posted on Categories Exciting Techniques, math programming, TutorialsTags , , 3 Comments on Introducing RcppDynProg

## Introducing RcppDynProg

`RcppDynProg` is a new `Rcpp` based `R` package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note).

Posted on Categories Uncategorized

## Rotary

We try to keep this blog mostly technical and business (as we assume that is what our readers are here for).

However, this post is going to be an exception.

I’ve just got back from photographing the Rotary Club of San Francisco‘s 2018 Holiday Party. We had a special guest SF Mayor London Breed (shown here with Rotary Club of San Francisco President Rhonda Poppen).

I am proud to say I have been a member of this organization for over 10 years. It is where I do my volunteer work both in San Francisco and internationally.

In particular I am thrilled to be supporting the efforts of a number of Rotarians and Roots of Peace in their latest effort to remediate farmland in Vietnam (with the help and permission of the Vietnamese government). These people are working hard to undo some of the pain and misery of unexploded ordinance (UXO). I’ll be helping with some administrative tasks and these people will be training hundreds of farmers to move into profitable world market crops.

Pictured above Heidi Kuhn and Christian Kuhn of Roots of Peace.

Posted on Tags ,

## vtreat Variable Importance

`vtreat`‘s purpose is to produce pure numeric `R` `data.frame`s that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes).

In this note we will discuss a small aspect of the `vtreat` package: variable screening.

Posted on Categories Coding, Exciting Techniques, Programming, TutorialsTags , ,

## Reusable Pipelines in R

Pipelines in `R` are popular, the most popular one being `magrittr` as used by `dplyr`.

This note will discuss the advanced re-usable piping systems: `rquery`/`rqdatatable` operator trees and `wrapr` function object pipelines. In each case we have a set of objects designed to extract extra power from the `wrapr` dot-arrow pipe `%.>%`.

Posted on Categories data science, Exciting Techniques, Programming, Tutorials2 Comments on Sharing Modeling Pipelines in R

## Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. `wrapr` supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the `vtreat` data preparation system.

Posted on Categories Coding, OpinionTags , , ,

## Timing Grouped Mean Calculation in R

This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement.

The original published timings were as follows:

With performance metrics: measurements are marketing. So let’s dig in the above a bit.

Posted on Categories Opinion, Programming, RantsTags , 2 Comments on Very Non-Standard Calling in R

## Very Non-Standard Calling in R

Our group has done a lot of work with non-standard calling conventions in `R`.

Our tools work hard to eliminate non-standard calling (as is the purpose of `wrapr::let()`), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in `R`.