Posted on Categories Coding, data science, Exciting Techniques, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , , , , , 1 Comment on Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC announces new “big data in R” tools

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN):

  • partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster and sequence steps to avoid critical issues (the complementary problems of too long in-mutate dependence chains, of too many mutate steps, and incidental bugs; all explained in the linked tutorials).
  • if_else_device(): provides a dplyr::mutate() based simulation of per-row conditional blocks (including conditional assignment). This allows powerful imperative code (such as often seen in porting from SAS) to be directly and legibly translated into performant dplyr::mutate() data flow code that works on Spark (via Sparklyr) and databases.


Blacksmith working

Image by Jeff Kubina from Columbia, Maryland – [1], CC BY-SA 2.0, Link

Continue reading Win-Vector LLC announces new “big data in R” tools

Posted on Categories Coding, data science, Expository Writing, Practical Data Science, Pragmatic Data Science, TutorialsTags , , , , , , , , 4 Comments on Using PostgreSQL in R: A quick how-to

Using PostgreSQL in R: A quick how-to

The combination of R plus SQL offers an attractive way to work with what we call medium-scale data: data that’s perhaps too large to gracefully work with in its entirety within your favorite desktop analysis tool (whether that be R or Excel), but too small to justify the overhead of big data infrastructure. In some cases you can use a serverless SQL database that gives you the power of SQL for data manipulation, while maintaining a lightweight infrastructure.

We call this work pattern “SQL Screwdriver”: delegating data handling to a lightweight infrastructure with the power of SQL for data manipulation.

NewImageImage: Iainf, some rights reserved.

We assume for this how-to that you already have a PostgreSQL database up and running. To get PostgreSQL for Windows, OSX, or Unix use the instructions at PostgreSQL downloads. If you happen to be on a Mac, then Postgres.app provides a “serverless” (or application oriented) install option.

For the rest of this post, we give a quick how-to on using the RpostgreSQL package to interact with Postgres databases in R.

Continue reading Using PostgreSQL in R: A quick how-to