RcppDynProg is a new
R package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note).
We try to keep this blog mostly technical and business (as we assume that is what our readers are here for).
However, this post is going to be an exception.
I’ve just got back from photographing the Rotary Club of San Francisco‘s 2018 Holiday Party. We had a special guest SF Mayor London Breed (shown here with Rotary Club of San Francisco President Rhonda Poppen).
I am proud to say I have been a member of this organization for over 10 years. It is where I do my volunteer work both in San Francisco and internationally.
In particular I am thrilled to be supporting the efforts of a number of Rotarians and Roots of Peace in their latest effort to remediate farmland in Vietnam (with the help and permission of the Vietnamese government). These people are working hard to undo some of the pain and misery of unexploded ordinance (UXO). I’ll be helping with some administrative tasks and these people will be training hundreds of farmers to move into profitable world market crops.
Pictured above Heidi Kuhn and Christian Kuhn of Roots of Peace.
vtreat‘s purpose is to produce pure numeric
data.frames that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes).
In this note we will discuss a small aspect of the
vtreat package: variable screening.
R are popular, the most popular one being
magrittr as used by
This note will discuss the advanced re-usable piping systems:
rqdatatable operator trees and
wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the
wrapr dot-arrow pipe
Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts.
wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the
vtreat data preparation system.
This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement.
The original published timings were as follows:
With performance metrics: measurements are marketing. So let’s dig in the above a bit.
Our group has done a lot of work with non-standard calling conventions in
Our tools work hard to eliminate non-standard calling (as is the purpose of
wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in
Please read on for a recent example.