Win-Vector LLC has recently been teaching how to use R with big data through Spark and sparklyr. We have also been helping clients become productive on R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using R, share some hints, and even admit to some of the warts found in this combination of systems.
The ability to perform sophisticated analyses and modeling on “big data” with R is rapidly improving, and this is the time for businesses to invest in the technology. Win-Vector can be your key partner in methodology development and training (through our consulting and training practices).
The field is exciting, rapidly evolving, and even a touch dangerous. We invite you to start using Spark through R and are starting a new series of articles tagged “R and big data” to help you produce production quality solutions quickly.
Recently Dirk Eddelbuettel pointed out that our R function debugging wrappers would be more convenient if they were available in a low-dependency micro package dedicated to little else. Dirk is a very smart person, and like most R users we are deeply in his debt; so we (Nina Zumel and myself) listened and immediately moved the wrappers into a new micro-package: wrapr.
Are you attending or considering attending Strata / Hadoop World 2017 San Jose? Are you interested in learning to use R to work with Spark and h2o? Then please consider signing up for my 3 1/2 hour workshop soon. We are about half full now, but I really want to fill the room, while making sure that people who really want to go get in.
Win-Vector LLC is partnering with RStudio to produce and present some awesome material that will allow you to perform data science at scale using R to control Spark and even h2o.
The links to the event are below. To make sure you get to participate please sign up soon!
Modeling big data with R, sparklyr, and Apache Spark (by RStudio and Win-Vector LLC)
03/14/2017 1:30pm – 5:00pm PDT (210 minutes)
Strata & Hadoop World West, San Jose Convention Center, CA; Room: LL21 C/D
link, materials (including slides)
Win-Vector LLC’s John Mount will teach how to use R to control big data analytics and modeling. In depth training to prepare you to use R, Spark, sparklyr, h2o, and rsparkling.
This is going to be hands-on exercises with R, sparklyr, and h2o using RStudio Server Pro (generously provided by RStudio!).
Sponsored by RStudio and
Office Hour with John Mount (Win-Vector LLC)
03/15/2017 2:40pm – 3:20pm PDT (40 minutes)
Strata & Hadoop World West, San Jose Convention Center, CA; Room: Table B
Come and ask me questions about data science, machine learning, R, statistics, or whatever you like.