Posted on Categories Administrativia, Computer Science, Pragmatic Data ScienceTags , , , , Leave a comment on Better SQL Generation via the data_algebra

Better SQL Generation via the data_algebra

In our recent note What is new for rquery December 2019 we mentioned an ugly processing pipeline that translates into SQL of varying size/quality depending on the query generator we use. In this note we try a near-relative of that query in the data_algebra.

Continue reading Better SQL Generation via the data_algebra

Posted on Categories data science, TutorialsTags , , , , , Leave a comment on New rquery Vignette: Working with Many Columns

New rquery Vignette: Working with Many Columns

We have a new rquery vignette here: Working with Many Columns.

This is an attempt to get back to writing about how to use the package to work with data (versus the other-day’s discussion of package design/implementation).

Please check it out.

Posted on Categories data science, TutorialsTags , , , , 1 Comment on data_algebra/rquery as a Category Over Table Descriptions

data_algebra/rquery as a Category Over Table Descriptions

Introduction

I would like to talk about some of the design principles underlying the data_algebra package (and also in its sibling rquery package).

The data_algebra package is a query generator that can act on either Pandas data frames or on SQL tables. This is discussed on the project site and the examples directory. In this note we will set up some technical terminology that will allow us to discuss some of the underlying design decisions. These are things that when they are done well, the user doesn’t have to think much about. Discussing such design decisions at length can obscure some of their charm, but we would like to point out some features here.

Continue reading data_algebra/rquery as a Category Over Table Descriptions

Posted on Categories data science, Exciting Techniques, TutorialsTags , , , , , 3 Comments on What is new for rquery December 2019

What is new for rquery December 2019

Our goal has been to make rquery the best query generation system for R (and to make data_algebra the best query generator for Python).

Lets see what rquery is good at, and what new features are making rquery better.

Continue reading What is new for rquery December 2019

Posted on Categories Administrativia, data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, StatisticsTags , , , Leave a comment on Slides for PyData LA 2019 vtreat Talk

Slides for PyData LA 2019 vtreat Talk

Slides for PyData LA 2019 vtreat Talk are here!

Posted on Categories Administrativia, Practical Data Science, StatisticsTags , , 2 Comments on Practical Data Science with R 2nd Edition now in-stock at Amazon.com!

Practical Data Science with R 2nd Edition now in-stock at Amazon.com!

Practical Data Science with R 2nd Edition is now in-stock at Amazon.com!

NewImage

Buy it for your favorite data scientist in time for the holidays!

Posted on Categories Administrativia, data science, ProgrammingTags , Leave a comment on Slides from the PyData2019 data_algebra lightning talk

Slides from the PyData2019 data_algebra lightning talk

Slides from my PyData2019 data_algebra lightning talk are here.

Posted on Categories Administrativia, data science, Practical Data ScienceTags , , Leave a comment on Practical Data Science with R, 2nd Edition: Introduction Video

Practical Data Science with R, 2nd Edition: Introduction Video

Nina and I have prepared a quick introduction video for Practical Data Science with R, 2nd Edition.

We are really proud of both editions of the book. This book can help an R user directly experience the data science style of working with data and machine learning techniques.

The book is available now at:

Please check it out!

Posted on Categories Administrativia, data scienceTags , , , Leave a comment on Nina Zumel and John Mount speaking on vtreat at PyData LA 2019

Nina Zumel and John Mount speaking on vtreat at PyData LA 2019

As we have announced before, we have ported the R version of vtreat to a new Python version of vtreat.

Our latest news is: we are speaking about the Python version at PyData LA 2019 (Thursday 10:50 AM–11:35 AM in Track 2 Room).

Continue reading Nina Zumel and John Mount speaking on vtreat at PyData LA 2019

Posted on Categories Administrativia, data science, Practical Data Science, StatisticsTags , , , , 7 Comments on Practical Data Science with R, 2nd Edition, IS OUT!!!!!!!

Practical Data Science with R, 2nd Edition, IS OUT!!!!!!!

Practical Data Science with R, 2nd Edition author Dr. Nina Zumel, with a fresh author’s copy of her book!

IMG 3384