Posted on Categories Opinion, Programming, StatisticsTags , , , , ,

There is usually more than one way in R

Python has a fairly famous design principle (from “PEP 20 — The Zen of Python”):

There should be one– and preferably only one –obvious way to do it.

Frankly in R (especially once you add many packages) there is usually more than one way. As an example we will talk about the common R functions: str(), head(), and the tibble package‘s glimpse(). Continue reading There is usually more than one way in R

Posted on Categories StatisticsTags , , , ,

Summarizing big data in R

Our next "R and big data tip" is: summarizing big data.

We always say "if you are not looking at the data, you are not doing science"- and for big data you are very dependent on summaries (as you can’t actually look at everything).

Simple question: is there an easy way to summarize big data in R?

The answer is: yes, but we suggest you use the replyr package to do so.

Continue reading Summarizing big data in R

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , , 1 Comment on Managing Spark data handles in R

Managing Spark data handles in R

When working with big data with R (say, using Spark and sparklyr) we have found it very convenient to keep data handles in a neat list or data_frame.


5465544053 8b626a09c8 b

Please read on for our handy hints on keeping your data handles neat. Continue reading Managing Spark data handles in R

Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , 5 Comments on New series: R and big data (concentrating on Spark and sparklyr)

New series: R and big data (concentrating on Spark and sparklyr)

Win-Vector LLC has recently been teaching how to use R with big data through Spark and sparklyr. We have also been helping clients become productive on R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using R, share some hints, and even admit to some of the warts found in this combination of systems.

The ability to perform sophisticated analyses and modeling on “big data” with R is rapidly improving, and this is the time for businesses to invest in the technology. Win-Vector can be your key partner in methodology development and training (through our consulting and training practices).


We Can Do It

J. Howard Miller, 1943.

The field is exciting, rapidly evolving, and even a touch dangerous. We invite you to start using Spark through R and are starting a new series of articles tagged “R and big data” to help you produce production quality solutions quickly.

Please read on for a brief description of our new articles series: “R and big data.” Continue reading New series: R and big data (concentrating on Spark and sparklyr)