Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , Leave a comment on WVPlots 1.1.2 on CRAN

WVPlots 1.1.2 on CRAN

I have put a new release of the WVPlots package up on CRAN. This release adds palette and/or color controls to most of the plotting functions in the package.

WVPlots was originally a catch-all package of ggplot2 visualizations that we at Win-Vector tended to use repeatedly, and wanted to turn into “one-liners.” A consequence of this is that the older visualizations had our preferred color schemes hard-coded in. More recent additions to the package sometimes had palette or color controls, but not in a consistent way. Making color controls more consistent has been a “todo” for a while—one that I’d been putting off. A recent request from user Brice Richard (thanks Brice!) has pushed me to finally make the changes.

Most visualizations in the package that color-code by group now have a palette argument that takes the name of a Brewer palette for the graph; Dark2 is usually the default. To use the ggplot2 default palette, or to set an alternative palette, such as viridis or a manually specified color scheme, set palette=NULL. Here’s some examples:

library(WVPlots)
library(ggplot2)

mpg = ggplot2::mpg
mpg$trans = gsub("\\(.*$", '', mpg$trans)
 
# default palette: Dark2 
DoubleDensityPlot(mpg, "cty", "trans", "City driving mpg by transmission type")

Unnamed chunk 1 1

# set a different Brewer color palette
DoubleDensityPlot(mpg, "cty", "trans", 
                  "City driving mpg by transmission type",
                  palette = "Accent")

Unnamed chunk 1 2

# set a custom palette
cmap = c("auto" = "#7b3294", "manual" = "#008837")

DoubleDensityPlot(mpg, "cty", "trans", 
                  "City driving mpg by transmission type",
                  palette=NULL) + 
  scale_color_manual(values=cmap) + 
  scale_fill_manual(values=cmap)

Unnamed chunk 1 3

For other plots, the user can now specify the desired color for different elements of the graph.

title = "Count of cars by number of carburetors and cylinders"

# default fill: darkblue
ShadowPlot(mtcars, "carb", "cyl",
           title = title)

Unnamed chunk 2 1

# specify fill
ShadowPlot(mtcars, "carb", "cyl",
           title = title,
           fillcolor = "#a6611a")

Unnamed chunk 2 2

We hope that these changes make WVPlots even more useful to our users. For examples of several of the visualizations in WVPlots, see this example vignette. For the complete list of visualizations, see the reference page.

Posted on Categories data science, Opinion, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags , , , , , , , , Leave a comment on Advanced Data Reshaping in Python and R

Advanced Data Reshaping in Python and R

This note is a simple data wrangling example worked using both the Python data_algebra package and the R cdata package. Both of these packages make data wrangling easy through he use of coordinatized data concepts (relying heavily on Codd’s “rule of access”).

The advantages of data_algebra and cdata are:

  • The user specifies their desired transform declaratively by example and in data. What one does is: work an example, and then write down what you want (we have a tutorial on this here).
  • The transform systems can print what a transform is going to do. This makes reasoning about data transforms much easier.
  • The transforms, as they themselves are written as data, can be easily shared between systems (such as R and Python).

Continue reading Advanced Data Reshaping in Python and R

Posted on Categories Administrativia, Opinion, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags , Leave a comment on New Getting Started with vtreat Documentation

New Getting Started with vtreat Documentation

Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation.

vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from:

  • Missing values
  • Large cardinality categorical variables
  • Novel levels from categorical variables

I hoped she could get the Python vtreat documentation up to parity with the R vtreat documentation. But I think she really hit the ball out of the park, and went way past that.

The new documentation is 3 “getting started” guides. These guides deliberately overlap, so you don’t have to read them all. Just read the one suited to your problem and go.

The new guides:

Perhaps we can back-port the new guides to the R version at some point.