replyr is an
R package that contains extensions, adaptions, and work-arounds to make remote
dplyr data sources (including big data systems such as
Spark) behave more like local data. This allows the analyst to more easily develop and debug procedures that simultaneously work on a variety of data services (in-memory
Spark2 currently being the primary supported platforms).
Recently Dirk Eddelbuettel pointed out that our
R function debugging wrappers would be more convenient if they were available in a low-dependency micro package dedicated to little else. Dirk is a very smart person, and like most
R users we are deeply in his debt; so we (Nina Zumel and myself) listened and immediately moved the wrappers into a new micro-package:
Image: Friedensreich Hundertwasser
I want to share an edited screencast of my rehearsal for my recent San Francisco Bay Area R Users Group talk:
One of the distinctive features of the
R platform is how explicit and user controllable everything is. This allows the style of use of
R to evolve fairly rapidly. I will discuss this and end with some new notations, methods, and tools I am nominating for inclusion into your view of the evolving “current best practice style” of working with
R. Continue reading Evolving R Tools and Practices
I am happy to announce a couple of exciting upcoming Win-Vector LLC public speaking engagements.
- BARUG Meetup Tuesday, Tuesday February 7, 2017 ~7:50pm, Intuit, Building 20, 2600 Marine Way, Mountain View, CA. Win-Vector LLC’s John Mount will be giving a “lightning talk” (15 minutes) on R calling conventions (standard versus non-standard) and showing how to use our
replyrpackage to greatly improve scripting or programming over
dplyr. Some articles on
replyrcan be found here.
- Strata & Hadoop World West, Tuesday March 14, 2017 1:30pm–5:00pm, San Jose Convention Center, CA, Location: LL21 C/D. Win-Vector LLC’s John Mount will teach how to use R to control big data analytics and modeling. In depth training to prepare you to use
rsparkling. In partnership with RStudio.
Hope to see you there!
My answer is: it does work. I am not a
data.table user so I am not the one to ask if
data.table benefits a from a non-standard evaluation to standard evaluation adapter such as
replyr::let. Continue reading Does replyr::let work with data.table?
Consider the problem of “parametric programming” in R. That is: simply writing correct code before knowing some details, such as the names of the columns your procedure will have to be applied to in the future. Our latest version of
replyr::let makes such programming easier.
Archie’s Mechanics #2 (1954) copyright Archie Publications
(edit: great news! CRAN just accepted our
replyr 0.2.0 fix release!)
Please read on for examples comparing standard notations and
replyr::let. Continue reading Comparative examples using replyr::let
It’s a common situation to have data from multiple processes in a “long” data format, for example a table with columns
process_that_produced_measurement. It’s also natural to split that data apart to analyze or transform it, per-process — and then to bring the results of that data processing together, for comparison. Such a work pattern is called “Split-Apply-Combine,” and we discuss several R implementations of this pattern here. In this article we show a simple example of one such implementation,
replyr::gapply, from our latest package,
Illustration by Boris Artzybasheff. Image: James Vaughn, some rights reserved.
The example task is to evaluate how several different models perform on the same classification problem, in terms of deviance, accuracy, precision and recall. We will use the “default of credit card clients” data set from the UCI Machine Learning Repository.