Posted on Categories Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Programming, Statistics, TutorialsTags , , ,

WVPlots: example plots in R using ggplot2

Nina Zumel and I have been working on packaging our favorite graphing techniques in a more reusable way that emphasizes the analysis task at hand over the steps needed to produce a good visualization. The idea is: we sacrifice some of the flexibility and composability inherent to ggplot2 in R for a menu of prescribed presentation solutions (which we are sharing on Github).

For example the plot below showing both an observed discrete empirical distribution (as stems) and a matching theoretical distribution (as bars) is a built in “one liner.”

NewImage

Please read on for some of the ideas and how to use this package.

The graph above is actually the product of a number of presentation decisions:

  • Using a discrete histogram approach to summarize data (instead of a kernel density approach) to create a presentation more familiar to business partners.
  • Using a Cleveland style dot with stem plot instead of wide bars to emphasize the stem heights represent total counts (and not the usual accidental misapprehension that bar areas represent totals).
  • Automatically fitting and rendering the matching (properly count-scaled) normal distribution as thin translucent bars for easy comparison (again to try and de-emphasize area).

All of these decisions are triggered by choosing which plot to use from the WVPlots library. In this case we chose WVPlots::PlotDistCountNormal. For an audience of analysts we might choose an area/density based representation (by instead specifying WVPlots::PlotDistDensityNormal) which is shown below:

NewImage

Switching the chosen plot simultaneously changes many of the details of the presentation. WVPlots is designed to make this change simple by insisting an a very simple unified calling convention. The plot calls all insist on roughly the following arguments:

  • frame: data frame containing the data to be presented.
  • xvar: name of the x variable column in the data frame.
  • yvar: name of the y variable column in the data frame (not part of the shown density plots!).
  • title: text title for the plot.

This rigid calling interface is easy to remember and makes switching between plot types very easy. We have also make title a required argument, as we feel all plots should be labeled.

What we are trying to do is separate the specification of exactly what plot we want from the details of how to produce it. We find this separation of concerns and encapsulation of implementation allows us to routinely use rich annotated graphics. Below are a few more examples:

NewImage

NewImage

NewImage

We know this collection doesn’t rise to the standard of a complete “grammar of graphics” or even a substantial library (which is why we are not submitting it to CRAN). But it can become (through accumulation) a re-usable repository of a number of specific graphing tasks done well. It is also a chance to eventually document presentation design decisions (though we haven’t gotten far on that yet). The complete current set of graphs is viewable as WVPlots examples at RStudio‘s rpubs site.

7 thoughts on “WVPlots: example plots in R using ggplot2”

  1. Great post. This graphing explosion over the past few years means people need to read/bookmark a dozen blog posts a day to try and keep up with code snippets #Hyperbole A central reusable format is a great idea.

  2. When I tried to install I got package ‘WVPlots’ is not available (for R version 3.0.3). Will it be made available?

    1. Sorry if that wasn’t clear. WVPlots brings is available from GitHub, not CRAN. We share the code at https://github.com/WinVector/WVPlots and it can be installed by typing (in R):

      install.packages(c('devtools','ggplot2'))
      devtools::install_github('WinVector/WVPlots',build_vignettes = TRUE)
      

      After that you can get to the vignettes by typing in R:

      library("WVPlots")
      help("WVPlots")
      

      And navigating to the index will show you all the functions, and the vignette has all the rendered plots. If the install has problems you can try leaving out the “,build_vignettes = TRUE” portion.

      Also we have been using Microsoft R Open 3.2.3 with of RStudio.

  3. I get the following when I try
    install.packages(c(‘devtools’,’ggplot2′))
    devtools::install_github(‘WinVector/WVPlots’,build_vignettes = TRUE)
    library(‘WVPlots’)
    help(‘WVPlots’)

    Downloading GitHub repo WinVector/WVPlots@master
    from URL https://api.github.com/repos/WinVector/WVPlots/zipball/master
    Error in curl::curl_fetch_memory(url, handle = handle) :
    Problem with the SSL CA cert (path? access rights?)
    > library(‘WVPlots’)
    Error in library(“WVPlots”) : there is no package called ‘WVPlots’
    > help(‘WVPlots’)
    No documentation for ‘WVPlots’ in specified packages and libraries:
    you could try ‘??WVPlots’

    1. From the error messages it looks like the curl command that devtools uses to install from Github is failing. Every error after that is just a symptom of the later commands not being runnable due to the install failure.

      I know this is frustrating- but the issue is devtools failed to install WVPlots, likely due something in your computer or network configuration being different than devtools expected.

      curl is a network downloading tool. The reasons it may fail can include: it curl/curllib not being installed on your system, Github not being accessible from your system (try visiting the URL https://github.com/WinVector/WVPlots in your browser to check), SSL/https/certificates not being configured on your computer, Github being blocked by your internet services provider, not being attached to the network, a transient glitch at Github, ad-blockers, anti-virus/network-security software, and many other possible causes.

      From the message it looks like the SSL certificate is the problem, but I don’t know what the particular issue is.

      I am sorry you are having trouble- but the devtools library is supposed to supply the “install from Github” as a services, and that is what is failing. I tried re-pasting the commands on my machine and they did work (from my comment, which doesn’t have the smart quotes which mess things up). I would at the least check that R (and if you use it RStudio) are up to date. R recently changed its compiler toolchain (which might have weird side-effects). You could also try install.packages('RCurl') to see if the curl issue is easily fixable. Of course, re-installing R and other tools can be its own can of worms.

      Edit 5/21/2016: I’ve added some more detailed install instructions here http://www.win-vector.com/blog/2016/05/installing-wvplots-and-knitting-r-markdown/ .

      1. You are very welcome. It was the feedback (comments and email) and patience of several readers who finally helped me correct and tune the instructions. The big issue being: I don’t regularly see the pain in installing things that are already installed (and hopefully you and the other readers eventually will not see such pain either).

Comments are closed.