Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial.
This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and expect to eventually work with both (if you are lucky).
Continue reading Data re-Shaping in R and in Python
I would like to talk about some of the design principles underlying the
data_algebra package (and also in its sibling
data_algebra package is a query generator that can act on either
Pandas data frames or on
SQL tables. This is discussed on the project site and the examples directory. In this note we will set up some technical terminology that will allow us to discuss some of the underlying design decisions. These are things that when they are done well, the user doesn’t have to think much about. Discussing such design decisions at length can obscure some of their charm, but we would like to point out some features here.
Continue reading data_algebra/rquery as a Category Over Table Descriptions
Our goal has been to make
rquery the best query generation system for
R (and to make
data_algebra the best query generator for
Lets see what
rquery is good at, and what new features are making
Continue reading What is new for rquery December 2019
Slides from my PyData2019 data_algebra lightning talk are here.
Nina Zumel had a really great article on how to prepare a nice
Keras performance plot using
I will use this example to show some of the advantages of
cdata record transform specifications.
Continue reading The Advantages of Record Transform Specifications