vtreat is an excellent way to prepare data for machine learning, statistical inference, and predictive analytic projects. If you are an
R user we strongly suggest you incorporate
vtreat into your projects. Continue reading Upcoming data preparation and modeling article series
Somebody nice reached out and gave us this wonderful feedback on our new Supervised Learning in R: Regression (paid) video course.
Thanks for a wonderful course on DataCamp on
Random forest. I was struggling with
Vtreathas made my life easy now :).
Please check them out (hint:
vtreat is our favorite).
I think I have hit a very good set of trade-offs, and I have now spent significant time creating documentation and examples.
I wish there had been such a package weeks ago, and that I had started using this approach in my own client work at that time. If you are already a
dplyr user I strongly suggest trying
seplyr in your own analysis projects.
Our free video course Campaign Response Testing is no longer published on Udemy. It remains available for free on YouTube with all source code available from GitHub. I’ll try to correct bad links as I find them.
Please read on for the reasons. Continue reading Campaign Response Testing no longer published on Udemy
Win-Vector LLC has recently been teaching how to use
R with big data through
sparklyr. We have also been helping clients become productive on
R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using
R, share some hints, and even admit to some of the warts found in this combination of systems.
The ability to perform sophisticated analyses and modeling on “big data” with
R is rapidly improving, and this is the time for businesses to invest in the technology. Win-Vector can be your key partner in methodology development and training (through our consulting and training practices).
J. Howard Miller, 1943.
The field is exciting, rapidly evolving, and even a touch dangerous. We invite you to start using
R and are starting a new series of articles tagged “R and big data” to help you produce production quality solutions quickly.
Please read on for a brief description of our new articles series: “R and big data.” Continue reading New series: R and big data (concentrating on Spark and sparklyr)
Our book Practical Data Science with R has just been reviewed in Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory (ACM SIGACT) News by Dr. Allan M. Miller (U.C. Berkeley)!
The book is half off at Manning March 21st 2017 using the following code (please share/Tweet):
Deal of the Day March 21: Half off my book Practical Data Science with R. Use code
Please read on for links and excerpts from the review. Continue reading Practical Data Science with R: ACM SIGACT News Book Review and Discount!
I have new short screencast up: using R and RStudio to install and experiment with Apache Spark.
More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.
We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Also, database adapters for R have greatly improved, so we feel more confident depending on them alone. Practical Data Science with R remains an excellent book and a good resource to learn from that we are very proud of and fully support (hence errata). Continue reading Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article