Posted on Categories Exciting Techniques, math programming, TutorialsTags , , 3 Comments on Introducing RcppDynProg

Introducing RcppDynProg

RcppDynProg is a new Rcpp based R package that implements simple, but powerful, table-based dynamic programming. This package can be used to optimally solve the minimum cost partition into intervals problem (described below) and is useful in building piecewise estimates of functions (shown in this note).

Continue reading Introducing RcppDynProg

Posted on Categories Uncategorized

Rotary

We try to keep this blog mostly technical and business (as we assume that is what our readers are here for).

However, this post is going to be an exception.

I’ve just got back from photographing the Rotary Club of San Francisco‘s 2018 Holiday Party. We had a special guest SF Mayor London Breed (shown here with Rotary Club of San Francisco President Rhonda Poppen).

IMG 0136

I am proud to say I have been a member of this organization for over 10 years. It is where I do my volunteer work both in San Francisco and internationally.

In particular I am thrilled to be supporting the efforts of a number of Rotarians and Roots of Peace in their latest effort to remediate farmland in Vietnam (with the help and permission of the Vietnamese government). These people are working hard to undo some of the pain and misery of unexploded ordinance (UXO). I’ll be helping with some administrative tasks and these people will be training hundreds of farmers to move into profitable world market crops.

IMG 0039

Pictured above Heidi Kuhn and Christian Kuhn of Roots of Peace.

Posted on Categories data science, Exciting Techniques, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags ,

vtreat Variable Importance

vtreat‘s purpose is to produce pure numeric R data.frames that are ready for supervised predictive modeling (predicting a value from other values). By ready we mean: a purely numeric data frame with no missing values and a reasonable number of columns (missing-values re-encoded with indicators, and high-degree categorical re-encode by effects codes or impact codes).

In this note we will discuss a small aspect of the vtreat package: variable screening.

Continue reading vtreat Variable Importance

Posted on Categories Coding, Opinion, Programming, TutorialsTags , ,

Quoting Concatenate

In our last note we used wrapr::qe() to help quote expressions. In this note we will discuss quoting and code-capturing interfaces (interfaces that capture user source code) a bit more.

Continue reading Quoting Concatenate

Posted on Categories Coding, Exciting Techniques, Programming, TutorialsTags , ,

Reusable Pipelines in R

Pipelines in R are popular, the most popular one being magrittr as used by dplyr.

This note will discuss the advanced re-usable piping systems: rquery/rqdatatable operator trees and wrapr function object pipelines. In each case we have a set of objects designed to extract extra power from the wrapr dot-arrow pipe %.>%.

Continue reading Reusable Pipelines in R

Posted on Categories data science, Exciting Techniques, Programming, TutorialsTags , , , , , , , 2 Comments on Sharing Modeling Pipelines in R

Sharing Modeling Pipelines in R

Reusable modeling pipelines are a practical idea that gets re-developed many times in many contexts. wrapr supplies a particularly powerful pipeline notation, and a pipe-stage re-use system (notes here). We will demonstrate this with the vtreat data preparation system.

Continue reading Sharing Modeling Pipelines in R

Posted on Categories Coding, OpinionTags , , ,

Timing Grouped Mean Calculation in R

This note is a comment on some of the timings shared in the dplyr-0.8.0 pre-release announcement.

The original published timings were as follows:

With performance metrics: measurements are marketing. So let’s dig in the above a bit.

Continue reading Timing Grouped Mean Calculation in R

Posted on Categories Opinion, Programming, RantsTags , 2 Comments on Very Non-Standard Calling in R

Very Non-Standard Calling in R

Our group has done a lot of work with non-standard calling conventions in R.

Our tools work hard to eliminate non-standard calling (as is the purpose of wrapr::let()), or at least make it cleaner and more controllable (as is done in the wrapr dot pipe). And even so, we still get surprised by some of the side-effects and mal-consequences of the over-use of non-standard calling conventions in R.

Please read on for a recent example.

Continue reading Very Non-Standard Calling in R