Posted on 9 Comments on Wanted: A Perfect Scatterplot (with Marginals)

## Wanted: A Perfect Scatterplot (with Marginals)

We saw this scatterplot with marginal densities the other day, in a blog post by Thomas Wiecki:

The graph was produced in Python, using the seaborn package. Seaborn calls it a “jointplot;” it’s called a “scatterhist” in Matlab, apparently. The seaborn version also shows the strength of the linear relationship between the x and y variables. Nice.

I like this plot a lot, but we’re mostly an R shop here at Win-Vector. So we asked: can we make this plot in ggplot2? Natively, ggplot2 can add rugs to a scatterplot, but doesn’t immediately offer marginals, as above.

However, you can use Dean Attali’s ggExtra package. Here’s an example using the same data as the seaborn jointplot above; you can download the dataset here.

library(ggplot2)
library(ggExtra)

plot_center = ggplot(frm, aes(x=total_bill,y=tip)) +
geom_point() +
geom_smooth(method="lm")

# default: type="density"
ggMarginal(plot_center, type="histogram")

I didn’t bother to add the internal annotation for the goodness of the linear fit, though I could.

The ggMarginal() function goes to heroic effort to line up the coordinate axes of all the graphs, and is probably the best way to do a scatterplot-plus-marginals in ggplot (you can also do it in base graphics, of course). Still, we were curious how close we could get to the seaborn version: marginal density and histograms together, along with annotations. Below is our version of the graph; we report the linear fit’s R-squared, rather than the Pearson correlation.

# our own (very beta) plot package: details later
library(WVPlots)