Posted on Categories data science, Programming, StatisticsTags , ,

More on sigr

If you’ve read our previous R Tip on using sigr with linear models, you might have noticed that the lm() summary object does in fact carry the R-squared and F statistics, both in the printed form:

model_lm <- lm(formula = Petal.Length ~ Sepal.Length, data = iris)
(smod_lm <- summary(model_lm))
## 
## Call:
## lm(formula = Petal.Length ~ Sepal.Length, data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.47747 -0.59072 -0.00668  0.60484  2.49512 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -7.10144    0.50666  -14.02   <2e-16 ***
## Sepal.Length  1.85843    0.08586   21.65   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8678 on 148 degrees of freedom
## Multiple R-squared:   0.76,  Adjusted R-squared:  0.7583 
## F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16

and also in the summary() object:

c(R2 = smod_lm$r.squared, F = smod_lm$fstatistic[1])

##          R2     F.value 
##   0.7599546 468.5501535

Note, though, that while the summary reports the model’s significance, it does not carry it as a specific summary() object item. sigr::wrapFTest() is a convenient way to extract the model’s R-squared and F statistic and simultaneously calculate the model significance, as is required by many scientific publications.

sigr is even more helpful for logistic regression, via glm(), which reports neither the model’s chi-squared statistic nor its significance.

iris$isVersicolor <- iris$Species == "versicolor"

model_glm <- glm(
  isVersicolor ~ Sepal.Length + Sepal.Width,
  data = iris,
  family = binomial)

(smod_glm <- summary(model_glm))

## 
## Call:
## glm(formula = isVersicolor ~ Sepal.Length + Sepal.Width, family = binomial, 
##     data = iris)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9769  -0.8176  -0.4298   0.8855   2.0855  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    8.0928     2.3893   3.387 0.000707 ***
## Sepal.Length   0.1294     0.2470   0.524 0.600247    
## Sepal.Width   -3.2128     0.6385  -5.032 4.85e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 190.95  on 149  degrees of freedom
## Residual deviance: 151.65  on 147  degrees of freedom
## AIC: 157.65
## 
## Number of Fisher Scoring iterations: 5

To get the significance of a logistic regression model, call wrapr::wrapChiSqTest():

library(sigr)
(chi2Test <- wrapChiSqTest(model_glm))

## [1] “Chi-Square Test summary: pseudo-R2=0.21 (X2(2,N=150)=39, p<1e-05).”

Notice that the fit summary also reports a pseudo-R-squared. You can extract the values directly off the sigr object, as well:

str(chi2Test)

## List of 10
##  $ test          : chr "Chi-Square test"
##  $ df.null       : int 149
##  $ df.residual   : int 147
##  $ null.deviance : num 191
##  $ deviance      : num 152
##  $ pseudoR2      : num 0.206
##  $ pValue        : num 2.92e-09
##  $ sig           : num 2.92e-09
##  $ delta_deviance: num 39.3
##  $ delta_df      : int 2
##  - attr(*, "class")= chr [1:2] "sigr_chisqtest" "sigr_statistic"

And of course you can render the sigr object into one of several formats (Latex, html, markdown, and ascii) for direct inclusion in a report or publication.

render(chi2Test, format = "html")

Chi-Square Test summary: pseudo-R2=0.21 (χ2(2,N=150)=39, p<1e-05).

By the way, if you are interested, we give the explicit formula for calculating the significance of a logistic regression model in Practical Data Science with R.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.