Python has a fairly famous design principle (from “PEP 20 — The Zen of Python”):
There should be one– and preferably only one –obvious way to do it.
R (especially once you add many packages) there is usually more than one way. As an example we will talk about the common
head(), and the
glimpse(). Continue reading There is usually more than one way in R
Our next "R and big data tip" is: summarizing big data.
We always say "if you are not looking at the data, you are not doing science"- and for big data you are very dependent on summaries (as you can’t actually look at everything).
Simple question: is there an easy way to summarize big data in
The answer is: yes, but we suggest you use the
replyr package to do so.
Win-Vector LLC has recently been teaching how to use
R with big data through
sparklyr. We have also been helping clients become productive on
R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using
R, share some hints, and even admit to some of the warts found in this combination of systems.
The ability to perform sophisticated analyses and modeling on “big data” with
R is rapidly improving, and this is the time for businesses to invest in the technology. Win-Vector can be your key partner in methodology development and training (through our consulting and training practices).
J. Howard Miller, 1943.
The field is exciting, rapidly evolving, and even a touch dangerous. We invite you to start using
R and are starting a new series of articles tagged “R and big data” to help you produce production quality solutions quickly.
Please read on for a brief description of our new articles series: “R and big data.” Continue reading New series: R and big data (concentrating on Spark and sparklyr)