Posted on Categories Statistics, Tutorials, UncategorizedTags , , , Leave a comment on R tip: Make Your Results Clear with sigr

R tip: Make Your Results Clear with sigr

R is designed to make working with statistical models fast, succinct, and reliable.

For instance building a model is a one-liner:

model <- lm(Petal.Length ~ Sepal.Length, data = iris)

And producing a detailed diagnostic summary of the model is also a one-liner:

summary(model)

# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = iris)
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -2.47747 -0.59072 -0.00668  0.60484  2.49512 
# 
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  -7.10144    0.50666  -14.02   <2e-16 ***
# Sepal.Length  1.85843    0.08586   21.65   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.8678 on 148 degrees of freedom
# Multiple R-squared:   0.76,   Adjusted R-squared:  0.7583 
# F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16

However, useful as the above is: it isn’t exactly presentation ready. To formally report the R-squared of our model we would have to cut and paste this information from the summary. That is a needlessly laborious and possibly error-prone step.

With the sigr package this can be made much easier:

library("sigr")
Rsquared <- wrapFTest(model)
print(Rsquared)

# [1] "F Test summary: (R2=0.76, F(1,148)=468.6, p<1e-05)."

And this formal summary can be directly rendered into many formats (Latex, html, markdown, and ascii).

render(Rsquared, format="html")

F Test summary: (R2=0.76, F(1,148)=468.6, p<1e-05).

sigr can help make your publication workflow much easier and more repeatable/reliable.

Posted on Categories UncategorizedTags , , 9 Comments on For loops in R can lose class information

For loops in R can lose class information

Did you know R‘s for() loop control structure drops class annotations from vectors? Continue reading For loops in R can lose class information

Posted on Categories Administrativia, Computer Science, data science, Exciting Techniques, Statistics, UncategorizedTags , , , ,

Our Differential Privacy Mini-series

We’ve just finished off a series of articles on some recent research results applying differential privacy to improve machine learning. Some of these results are pretty technical, so we thought it was worth working through concrete examples. And some of the original results are locked behind academic journal paywalls, so we’ve tried to touch on the highlights of the papers, and to play around with variations of our own.

Blurry snowflakes stock by cosmicgallifrey d3inho1

  • A Simpler Explanation of Differential Privacy: Quick explanation of epsilon-differential privacy, and an introduction to an algorithm for safely reusing holdout data, recently published in Science (Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth, “The reusable holdout: Preserving validity in adaptive data analysis”, Science, vol 349, no. 6248, pp. 636-638, August 2015).

    Note that Cynthia Dwork is one of the inventors of differential privacy, originally used in the analysis of sensitive information.

  • Using differential privacy to reuse training data: Specifically, how differential privacy helps you build efficient encodings of categorical variables with many levels from your training data without introducing undue bias into downstream modeling.
  • A simple differentially private-ish procedure: The bootstrap as an alternative to Laplace noise to introduce privacy.

Our R code and experiments are available on Github here, so you can try some experiments and variations yourself.

Image Credit