I recently got back from Strata West 2017 (where I ran a very well received workshop on
Spark). One thing that really stood out for me at the exhibition hall was
datashader from Continuum Analytics.
I had the privilege of having Peter Wang himself demonstrate
datashader for me and answer a few of my questions.
I am so excited about
datashader capabilities I literally will not wait for the functionality to be exposed in
rbokeh. I am going to leave my usual
rmarkdown world and dust off
Jupyter Notebook just to use
datashader plotting. This is worth trying, even for diehard
R users. Continue reading Datashader is a big deal
Our book Practical Data Science with R has just been reviewed in Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory (ACM SIGACT) News by Dr. Allan M. Miller (U.C. Berkeley)!
The book is half off at Manning March 21st 2017 using the following code (please share/Tweet):
Deal of the Day March 21: Half off my book Practical Data Science with R. Use code
Please read on for links and excerpts from the review. Continue reading Practical Data Science with R: ACM SIGACT News Book Review and Discount!
He even picked the right image:
I have new short screencast up: using R and RStudio to install and experiment with Apache Spark.
More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.
We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Also, database adapters for R have greatly improved, so we feel more confident depending on them alone. Practical Data Science with R remains an excellent book and a good resource to learn from that we are very proud of and fully support (hence errata). Continue reading Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article
sigr is a simple
R package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use.
replyr is an
R package that contains extensions, adaptions, and work-arounds to make remote
dplyr data sources (including big data systems such as
Spark) behave more like local data. This allows the analyst to more easily develop and debug procedures that simultaneously work on a variety of data services (in-memory
Spark2 currently being the primary supported platforms).