The R package wrapr now has a neat new feature: “wrapr_applicable”.
This feature allows objects to declare a surrogate function to stand in for the object in wrapr pipelines. It is a powerful technique and allowed us to quickly implement a convenient new ad hoc query mode for rquery.
A small effort in making a package “wrapr aware” appears to have a fairly large payoff.
I am excited to share a new deep learning model performance trajectory graph.
Here is an example produced based on Keras in R using ggplot2:
Continue reading Plotting Deep Learning Model Performance Trajectories
For some time we have been teaching
R users "when working with wide tables on Spark or on databases: narrow to the columns you really want to work with early in your analysis."
The idea behind the advice is: working with fewer columns makes for quicker queries.
photo: Jacques Henri Lartigue 1912
The issue arises because wide tables (200 to 1000 columns) are quite common in big-data analytics projects. Often these are "denormalized marts" that are used to drive many different projects. For any one project only a small subset of the columns may be relevant in a calculation.
Continue reading How to Greatly Speed Up Your Spark Queries
Was enjoying Gabriel’s article Pipes in R Tutorial For Beginners and wanted call attention to a few more pipes in R (not all for beginners).
Continue reading More Pipes in R
Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the
0.5.0 version of
seplyr (also now available on CRAN):
partition_mutate_qt(): these are query planners/optimizers that work over
dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster and sequence steps to avoid critical issues (the complementary problems of too long in-mutate dependence chains, of too many mutate steps, and incidental bugs; all explained in the linked tutorials).
if_else_device(): provides a
dplyr::mutate() based simulation of per-row conditional blocks (including conditional assignment). This allows powerful imperative code (such as often seen in porting from SAS) to be directly and legibly translated into performant
dplyr::mutate() data flow code that works on Spark (via Sparklyr) and databases.
Image by Jeff Kubina from Columbia, Maryland – , CC BY-SA 2.0, Link Continue reading Win-Vector LLC announces new “big data in R” tools
We have been writing a lot on higher-order data transforms lately:
What I want to do now is "write a bit more, so I finally feel I have been concise."
Continue reading Arbitrary Data Transforms Using cdata
I have just released some simple RStudio add-ins that are great for creating keyboard shortcuts when working with pipes in R.
You can install the add-ins from here (which also includes both installation instructions and use instructions/examples).
Just wrote a new
R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template).
Please check it out.
We have just released a major update of the
cdata R package to CRAN.
If you work with
R and data, now is the time to check out the
cdata package. Continue reading Update on coordinatized or fluid data
Our article "Let’s Have Some Sympathy For The Part-time R User" includes two points:
- Sometimes you have to write parameterized or re-usable code.
- The methods for doing this should be easy and legible.
The first point feels abstract, until you find yourself wanting to re-use code on new projects. As for the second point: I feel the
wrapr package is the easiest, safest, most consistent, and most legible way to achieve maintainable code re-use in
In this article we will show how
wrapr makes code-rewriting even easier with its new
let x=x automation.
Continue reading Let X=X in R