Posted on Categories data science, Programming, TutorialsTags , , 1 Comment on New wrapr R pipeline feature: wrapr_applicable

New wrapr R pipeline feature: wrapr_applicable

The R package wrapr now has a neat new feature: “wrapr_applicable”.

Wraprs

This feature allows objects to declare a surrogate function to stand in for the object in wrapr pipelines. It is a powerful technique and allowed us to quickly implement a convenient new ad hoc query mode for rquery.

A small effort in making a package “wrapr aware” appears to have a fairly large payoff.

Posted on Categories Exciting Techniques, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , 2 Comments on Plotting Deep Learning Model Performance Trajectories

Plotting Deep Learning Model Performance Trajectories

I am excited to share a new deep learning model performance trajectory graph.

Here is an example produced based on Keras in R using ggplot2:

Unknown Continue reading Plotting Deep Learning Model Performance Trajectories

Posted on Categories data science, Exciting Techniques, Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , Leave a comment on How to Greatly Speed Up Your Spark Queries

How to Greatly Speed Up Your Spark Queries

For some time we have been teaching R users "when working with wide tables on Spark or on databases: narrow to the columns you really want to work with early in your analysis."

The idea behind the advice is: working with fewer columns makes for quicker queries.


speed

photo: Jacques Henri Lartigue 1912

The issue arises because wide tables (200 to 1000 columns) are quite common in big-data analytics projects. Often these are "denormalized marts" that are used to drive many different projects. For any one project only a small subset of the columns may be relevant in a calculation.

Continue reading How to Greatly Speed Up Your Spark Queries

Posted on Categories Programming, Statistics, TutorialsTags , , 3 Comments on More Pipes in R

More Pipes in R

Was enjoying Gabriel’s article Pipes in R Tutorial For Beginners and wanted call attention to a few more pipes in R (not all for beginners).

Continue reading More Pipes in R

Posted on Categories Coding, data science, Exciting Techniques, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , , , , , 1 Comment on Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN):

  • partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster and sequence steps to avoid critical issues (the complementary problems of too long in-mutate dependence chains, of too many mutate steps, and incidental bugs; all explained in the linked tutorials).
  • if_else_device(): provides a dplyr::mutate() based simulation of per-row conditional blocks (including conditional assignment). This allows powerful imperative code (such as often seen in porting from SAS) to be directly and legibly translated into performant dplyr::mutate() data flow code that works on Spark (via Sparklyr) and databases.


Blacksmith working

Image by Jeff Kubina from Columbia, Maryland – [1], CC BY-SA 2.0, Link

Continue reading Win-Vector LLC announces new “big data in R” tools

Posted on Categories data science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , 3 Comments on Arbitrary Data Transforms Using cdata

Arbitrary Data Transforms Using cdata

We have been writing a lot on higher-order data transforms lately:

Cdata

What I want to do now is "write a bit more, so I finally feel I have been concise."

Continue reading Arbitrary Data Transforms Using cdata

Posted on Categories Programming, Statistics, TutorialsTags , , , , , , 2 Comments on RStudio Keyboard Shortcuts for Pipes

RStudio Keyboard Shortcuts for Pipes

I have just released some simple RStudio add-ins that are great for creating keyboard shortcuts when working with pipes in R.

You can install the add-ins from here (which also includes both installation instructions and use instructions/examples).

RStudio Logo Blue Gradient

Wraprs BizarroPipe Logo

Posted on Categories Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , , , , , ,

Data Wrangling at Scale

Just wrote a new R article: “Data Wrangling at Scale” (using Dirk Eddelbuettel’s tint template).

Fd

Please check it out.

Posted on Categories Administrativia, Statistics, TutorialsTags , , , , , 8 Comments on Update on coordinatized or fluid data

Update on coordinatized or fluid data

We have just released a major update of the cdata R package to CRAN.

Cdata

If you work with R and data, now is the time to check out the cdata package. Continue reading Update on coordinatized or fluid data

Posted on Categories Coding, Opinion, Statistics, TutorialsTags , , , ,

Let X=X in R

Our article "Let’s Have Some Sympathy For The Part-time R User" includes two points:

  • Sometimes you have to write parameterized or re-usable code.
  • The methods for doing this should be easy and legible.

The first point feels abstract, until you find yourself wanting to re-use code on new projects. As for the second point: I feel the wrapr package is the easiest, safest, most consistent, and most legible way to achieve maintainable code re-use in R.

In this article we will show how wrapr makes code-rewriting even easier with its new let x=x automation.


411gJqs4qlL

Let X=X

Continue reading Let X=X in R