In a previous article, we showed the use of partial pooling, or hierarchical/multilevel models, for level coding high-cardinality categorical variables in `vtreat`

. In this article, we will discuss a little more about the how and why of partial pooling in `R`

.

We will use the `lme4`

package to fit the hierarchical models. The acronym “lme” stands for “linear mixed-effects” models: models that combine so-called “fixed effects” and “random effects” in a single (generalized) linear model. The `lme4`

documentation uses the random/fixed effects terminology, but we are going to follow Gelman and Hill, and avoid the use of the terms “fixed” and “random” effects.

The varying coefficients [corresponding to the levels of a categorical variable] in a multilevel model are sometimes called

random effects, a term that refers to the randomness in the probability model for the group-level coefficients….

The term

fixed effectsis used in contrast to random effects – but not in a consistent way! … Because of the conflicting definitions and advice, we will avoid the terms “fixed” and “random” entirely, and focus on the description of the model itself…

– Gelman and Hill 2007, Chapter 11.4

We will also restrict ourselves to the case that `vtreat`

considers: partially pooled estimates of conditional group expectations, with no other predictors considered.

Continue reading Partial Pooling for Lower Variance Variable Encoding