R often uses a concept of
factors to re-encode strings. This can be too early and too aggressive. Sometimes a string is just a string.
It is often claimed Sigmund Freud said “Sometimes a cigar is just a cigar.”
To avoid problems delay re-encoding of strings by using
stringsAsFactors = FALSE when creating
d <- data.frame(label = rep("tbd", 5)) d$label[] <- "north" #> Warning in `[[<-.factor`(`*tmp*`, 2, value = structure(c(1L, NA, 1L, 1L, : #> invalid factor level, NA generated print(d) #> label #> 1 tbd #> 2 <NA> #> 3 tbd #> 4 tbd #> 5 tbd
Notice our new value was not copied in!
The fix is easy: use
stringsAsFactors = FALSE.
d <- data.frame(label = rep("tbd", 5), stringsAsFactors = FALSE) d$label[] <- "north" print(d) #> label #> 1 tbd #> 2 north #> 3 tbd #> 4 tbd #> 5 tbd
As is often the case: base
R works okay in default mode and works very well if you judiciously change a few defaults. There is much less need to whole-hog replace
R functionality than some claim.
Note: the above pattern of pre-building a
data.frame and filling values by addressing row/column index sets is a very effective (and under appreciated) way to build up data (often easier and quicker than binding rows or columns).