In our recent note What is new for
rquery December 2019 we mentioned an ugly processing pipeline that translates into
SQL of varying size/quality depending on the query generator we use. In this note we try a near-relative of that query in the
Continue reading Better SQL Generation via the data_algebra
To make getting started with
rquery (an advanced query generator for
R) easier we have re-worked the package
README for various data-sources (including
Continue reading Getting Started With rquery
rquery talk went very well, thank you very much to the attendees for being an attentive and generous audience.
rquery at BARUG, photo credit: Timothy Liu)
I am now looking for invitations to give a streamlined version of this talk privately to groups using
R who want to work with
SQL (with databases such as PostgreSQL or big data systems such as Apache Spark).
rquery has a number of features that greatly improve team productivity in this environment (strong separation of concerns, strong error checking, high usability, specific debugging features, and high performance queries).
If your group is in the San Francisco Bay Area and using
R to work with a
SQL accessible data source, please reach out to me at firstname.lastname@example.org, I would be honored to show your team how to speed up their project and lower development costs with
rquery. If you are a big data vendor and some of your clients use
R, I am especially interested in getting in touch: our system can help
R users start working with your installation.
I would like to thank LinkedIn for letting me speak with some of their data scientists and analysts.
John Mount discussing
SQL generation at LinkedIn.
If you have a group using
R at database or
Spark scale, please reach out ( jmount at win-vector.com ). We (Win-Vector LLC) have some great new tools I’d love to speak on and share. I’d love an invite, especially if your group is in the San Francisco Bay Area.
Note: we also now have a 1/2 to 1 day on-site “Spark for R Users” training offering. Again, please reach out if your team is interested.
Win-Vector LLC recently announced the
R package, an operator based query generator.
In this note I want to share some exciting and favorable initial rquery benchmark timings.
Continue reading rquery: Fast Data Manipulation in R
This note describes a useful
replyr tool we call a "join controller" (and is part of our "R and Big Data" series, please see here for the introduction, and here for one our big data courses).
Continue reading Use a Join Controller to Document Your Work
I want to discuss a nice series of figures used to teach relational join semantics in R for Data Science by Garrett Grolemund and Hadley Wickham, O’Reilly 2016. Below is an example from their book illustrating an inner join:
Please read on for my discussion of this diagram and teaching joins. Continue reading Visualizing relational joins
We have updated the errata for Practical Data Science with R to reflect that it is no longer worth the effort to use the Java version of SQLScrewdriver as described.
We are very sorry for any confusion, trouble, or wasted effort bringing in Java software (something we are very familiar with, but forget not everybody uses) has caused readers. Also, database adapters for R have greatly improved, so we feel more confident depending on them alone. Practical Data Science with R remains an excellent book and a good resource to learn from that we are very proud of and fully support (hence errata). Continue reading Practical Data Science with R errata update: Java SQLScrewdriver replaced by R procedures and article