Posted on Categories data science, Programming, StatisticsTags , ,

## More on sigr

If you’ve read our previous R Tip on using sigr with linear models, you might have noticed that the `lm()` summary object does in fact carry the R-squared and F statistics, both in the printed form:

```model_lm <- lm(formula = Petal.Length ~ Sepal.Length, data = iris)
(smod_lm <- summary(model_lm))
##
## Call:
## lm(formula = Petal.Length ~ Sepal.Length, data = iris)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -2.47747 -0.59072 -0.00668  0.60484  2.49512
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -7.10144    0.50666  -14.02   <2e-16 ***
## Sepal.Length  1.85843    0.08586   21.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8678 on 148 degrees of freedom
## Multiple R-squared:   0.76,  Adjusted R-squared:  0.7583
## F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16
```

and also in the `summary()` object:

```c(R2 = smod_lm\$r.squared, F = smod_lm\$fstatistic[1])

##          R2     F.value
##   0.7599546 468.5501535
```

Note, though, that while the summary reports the model’s significance, it does not carry it as a specific `summary()` object item. `sigr::wrapFTest()` is a convenient way to extract the model’s R-squared and F statistic and simultaneously calculate the model significance, as is required by many scientific publications.

`sigr` is even more helpful for logistic regression, via `glm()`, which reports neither the model’s chi-squared statistic nor its significance.

```iris\$isVersicolor <- iris\$Species == "versicolor"

model_glm <- glm(
isVersicolor ~ Sepal.Length + Sepal.Width,
data = iris,
family = binomial)

(smod_glm <- summary(model_glm))

##
## Call:
## glm(formula = isVersicolor ~ Sepal.Length + Sepal.Width, family = binomial,
##     data = iris)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.9769  -0.8176  -0.4298   0.8855   2.0855
##
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)    8.0928     2.3893   3.387 0.000707 ***
## Sepal.Length   0.1294     0.2470   0.524 0.600247
## Sepal.Width   -3.2128     0.6385  -5.032 4.85e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 190.95  on 149  degrees of freedom
## Residual deviance: 151.65  on 147  degrees of freedom
## AIC: 157.65
##
## Number of Fisher Scoring iterations: 5
```

To get the significance of a logistic regression model, call `wrapr::wrapChiSqTest():`

```library(sigr)
(chi2Test <- wrapChiSqTest(model_glm))

## [1] “Chi-Square Test summary: pseudo-R2=0.21 (X2(2,N=150)=39, p<1e-05).”
```

Notice that the fit summary also reports a pseudo-R-squared. You can extract the values directly off the `sigr` object, as well:

```str(chi2Test)

## List of 10
##  \$ test          : chr "Chi-Square test"
##  \$ df.null       : int 149
##  \$ df.residual   : int 147
##  \$ null.deviance : num 191
##  \$ deviance      : num 152
##  \$ pseudoR2      : num 0.206
##  \$ pValue        : num 2.92e-09
##  \$ sig           : num 2.92e-09
##  \$ delta_deviance: num 39.3
##  \$ delta_df      : int 2
##  - attr(*, "class")= chr [1:2] "sigr_chisqtest" "sigr_statistic"
```

And of course you can render the `sigr` object into one of several formats (Latex, html, markdown, and ascii) for direct inclusion in a report or publication.

```render(chi2Test, format = "html")
```

Chi-Square Test summary: pseudo-R2=0.21 (χ2(2,N=150)=39, p<1e-05).

By the way, if you are interested, we give the explicit formula for calculating the significance of a logistic regression model in Practical Data Science with R.

Posted on Categories Statistics, Tutorials, UncategorizedTags , , ,

## R tip: Make Your Results Clear with sigr

R is designed to make working with statistical models fast, succinct, and reliable.

For instance building a model is a one-liner:

```model <- lm(Petal.Length ~ Sepal.Length, data = iris)
```

And producing a detailed diagnostic summary of the model is also a one-liner:

```summary(model)

# Call:
# lm(formula = Petal.Length ~ Sepal.Length, data = iris)
#
# Residuals:
#      Min       1Q   Median       3Q      Max
# -2.47747 -0.59072 -0.00668  0.60484  2.49512
#
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)
# (Intercept)  -7.10144    0.50666  -14.02   <2e-16 ***
# Sepal.Length  1.85843    0.08586   21.65   <2e-16 ***
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.8678 on 148 degrees of freedom
# Multiple R-squared:   0.76,   Adjusted R-squared:  0.7583
# F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16
```

However, useful as the above is: it isn’t exactly presentation ready. To formally report the R-squared of our model we would have to cut and paste this information from the summary. That is a needlessly laborious and possibly error-prone step.

With the `sigr` package this can be made much easier:

```library("sigr")
Rsquared <- wrapFTest(model)
print(Rsquared)

# [1] "F Test summary: (R2=0.76, F(1,148)=468.6, p<1e-05)."
```

And this formal summary can be directly rendered into many formats (Latex, html, markdown, and ascii).

```render(Rsquared, format="html")
```

F Test summary: (R2=0.76, F(1,148)=468.6, p<1e-05).

`sigr` can help make your publication workflow much easier and more repeatable/reliable.

Posted on Tags , , 1 Comment on Quick Significance Calculations for A/B Tests in R

## Introduction

Let’s take a quick look at a very important and common experimental problem: checking if the difference in success rates of two Binomial experiments is statistically significant. This can arise in A/B testing situations such as online advertising, sales, and manufacturing.

We already share a free video course on a Bayesian treatment of planning and evaluating A/B tests (including a free Shiny application). Let’s now take a look at the should be simple task of simply building a summary statistic that includes a classic frequentist significance.

Posted on Categories data science, Opinion, Statistics, Tutorials

## We Want to be Playing with a Moderate Number of Powerful Blocks

Many data scientists (and even statisticians) often suffer under one of the following misapprehensions:

• They believe a technique doesn’t work in their current situation (when in fact it does), leading to useless precautions and missed opportunities.
• They believe a technique does work in their current situation (when in fact it does not), leading to failed experiments or incorrect results.

I feel this happens less often if you are working with observable and composable tools of the proper scale. Somewhere between monolithic all in one systems, and ad-hoc one-off coding is a cognitive sweet spot where great work can be done.

Posted on Tags , , , , , , , 1 Comment on More documentation for Win-Vector R packages

## More documentation for Win-Vector R packages

The Win-Vector public R packages now all have new `pkgdown` documentation sites! (And, a thank-you to Hadley Wickham for developing the `pkgdown` tool.)

Please check them out (hint: `vtreat` is our favorite).

Posted on Categories Programming, StatisticsTags , , ,

## sigr: Simple Significance Reporting

`sigr` is a simple `R` package that conveniently formats a few statistics and their significance tests. This allows the analyst to use the correct test no matter what modeling package or procedure they use.