Posted on Categories Opinion, Programming, TutorialsTags , , , 1 Comment on Iteration and closures in R

Iteration and closures in R

I recently read an interesting thread on unexpected behavior in R when creating a list of functions in a loop or iteration. The issue is solved, but I am going to take the liberty to try and re-state and slow down the discussion of the problem (and fix) for clarity.

The issue is: are references or values captured during iteration?

Many users expect values to be captured. Most programming language implementations capture variables or references (leading to strange aliasing issues). It is confusing (especially in R, which pushes so far in the direction of value oriented semantics) and best demonstrated with concrete examples.


NewImage

Please read on for a some of the history and future of this issue. Continue reading Iteration and closures in R

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Programming, Statistics, TutorialsTags , , , , 16 Comments on The Zero Bug

The Zero Bug

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it.


The zero bug

The zero bug

Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” from examples, and this causes problems. Please read on for what this means, the consequences, and how to avoid the problem. Continue reading The Zero Bug

Posted on Categories Administrativia, Programming, StatisticsTags , , , , 5 Comments on Announcing the wrapr packge for R

Announcing the wrapr packge for R

Recently Dirk Eddelbuettel pointed out that our R function debugging wrappers would be more convenient if they were available in a low-dependency micro package dedicated to little else. Dirk is a very smart person, and like most R users we are deeply in his debt; so we (Nina Zumel and myself) listened and immediately moved the wrappers into a new micro-package: wrapr.


WrapperImage: Friedensreich Hundertwasser
Continue reading Announcing the wrapr packge for R

Posted on Categories Administrativia, StatisticsTags , , , , , , 5 Comments on My recent BARUG talk: Parametric Programming in R with replyr

My recent BARUG talk: Parametric Programming in R with replyr

I want to share an edited screencast of my rehearsal for my recent San Francisco Bay Area R Users Group talk:



Posted on Categories Programming, Statistics, TutorialsTags , , , , , 1 Comment on Evolving R Tools and Practices

Evolving R Tools and Practices

One of the distinctive features of the R platform is how explicit and user controllable everything is. This allows the style of use of R to evolve fairly rapidly. I will discuss this and end with some new notations, methods, and tools I am nominating for inclusion into your view of the evolving “current best practice style” of working with R. Continue reading Evolving R Tools and Practices

Posted on Categories Administrativia, StatisticsTags , , , , Leave a comment on Going to Strata / Hadoop World 2017 San Jose?

Going to Strata / Hadoop World 2017 San Jose?

Are you attending or considering attending Strata / Hadoop World 2017 San Jose? Are you interested in learning to use R to work with Spark and h2o? Then please consider signing up for my 3 1/2 hour workshop soon. We are about half full now, but I really want to fill the room, while making sure that people who really want to go get in.

Win-Vector LLC is partnering with RStudio to produce and present some awesome material that will allow you to perform data science at scale using R to control Spark and even h2o.

The links to the event are below. To make sure you get to participate please sign up soon!

  • Modeling big data with R, sparklyr, and Apache Spark (by RStudio and Win-Vector LLC)

    03/14/2017 1:30pm – 5:00pm PDT (210 minutes)

    Strata & Hadoop World West, San Jose Convention Center, CA; Room: LL21 C/D

    link, materials (including slides)

    Win-Vector LLC’s John Mount will teach how to use R to control big data analytics and modeling. In depth training to prepare you to use R, Spark, sparklyr, h2o, and rsparkling.

    This is going to be hands-on exercises with R, sparklyr, and h2o using RStudio Server Pro (generously provided by RStudio!).

    Sponsored by RStudio and
    Win-Vector LLC.

  • Office Hour with John Mount (Win-Vector LLC)

    03/15/2017 2:40pm – 3:20pm PDT (40 minutes)

    Strata & Hadoop World West, San Jose Convention Center, CA; Room: Table B

    link

    Come and ask me questions about data science, machine learning, R, statistics, or whatever you like.