The development version of my new R package seplyr is performing in practical applications with dplyr0.7.*much better than even I (the seplyr package author) expected.
I think I have hit a very good set of trade-offs, and I have now spent significant time creating documentation and examples.
I wish there had been such a package weeks ago, and that I had started using this approach in my own client work at that time. If you are already a dplyr user I strongly suggest trying seplyr in your own analysis projects.
Win-Vector LLC has recently been teaching how to use R with big data through Spark and sparklyr. We have also been helping clients become productive on R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using R, share some hints, and even admit to some of the warts found in this combination of systems.
The ability to perform sophisticated analyses and modeling on “big data” with R is rapidly improving, and this is the time for businesses to invest in the technology. Win-Vector can be your key partner in methodology development and training (through our consulting and training practices).
The field is exciting, rapidly evolving, and even a touch dangerous. We invite you to start using Spark through R and are starting a new series of articles tagged “R and big data” to help you produce production quality solutions quickly.