Posted on Categories Coding, TutorialsTags , , ,

R Tip: Use Named Vectors to Re-Map Values

Here is an R tip. Want to re-map a column of values? Use a named vector as the mapping.

Example:

library("dplyr")
library("wrapr")

head(starwars[, qc(name, gender)])

# # A tibble: 6 x 2
# name           gender
# <chr>          <chr> 
#   1 Luke Skywalker male  
# 2 C-3PO          NA    
# 3 R2-D2          NA    
# 4 Darth Vader    male  
# 5 Leia Organa    female
# 6 Owen Lars      male  

(For qc() please see R Tip: Use qc() For Fast Legible Quoting.)

Now suppose we want to remap the gender designations to a more concise notation.

The key point is to specify the transformation as data, and not as code. This can be efficient and easy to maintain. For our example we will use qc() (from wrapr) to create the named vector we wish to use a the map (and optionally also :=).

map <- qc(female = F, hermaphrodite = H, male = M, none = N)
# # or if we want to use := to write this 
# # in names := values assignment style
# map <- qc(female, hermaphrodite, male, none) :=
#        qc(F,      H,             M,    N   ) 

It is then a simple matter to transform the column (using either base R or dplyr style notations).

# base R version of the mapping
starwars$gender <- map[starwars$gender]

# # dplyr version of the mapping
# starwars <- starwars %>% mutate(., gender = map[gender])

head(starwars[, qc(name, gender)])

# # A tibble: 6 x 2
# name           gender
# <chr>          <chr> 
# 1 Luke Skywalker M     
# 2 C-3PO          NA    
# 3 R2-D2          NA    
# 4 Darth Vader    M     
# 5 Leia Organa    F     
# 6 Owen Lars      M     

This sort of “using a vector as a mapping function” is often easier than a join or nested if or case statement.

For a code-like presentation of named vectors, try map_to_char():

map_to_char(map)

# [1] "c('female' = 'F', 'hermaphrodite' = 'H', 'male' = 'M', 'none' = 'N')"

4 thoughts on “R Tip: Use Named Vectors to Re-Map Values”

  1. Just a note (and I wish I had re-checked this before posting), the direct mapping technique does not appear to work on remote (database) dplyr examples. However this is not a problem as the result is easy to achieve by a left_join().

    library("dplyr")
    db <- DBI::dbConnect(RSQLite::SQLite(), 
                         ":memory:")
    
    dLocal <- starwars[, c("name", "gender")]
    dRemote <- dplyr::copy_to(db, dLocal, "dRemote")
                        
    map <- c("female" = "F", 
             "hermaphrodite" = "H", 
             "male" = "M", 
             "none" = "N")
    
    dRemote %>% 
      mutate(., gender = map[gender])
    #> Error in eval_bare(call, env): object 'gender' not found
    
    dRemote %>% 
      mutate(., gender = map[.data$gender])
    #> Error in eval_bare(call, env): object '.data' not found
    
    # direct left join solution
    mapf <- data.frame(gender = names(map),
                       gender_mapped = as.character(map),
                       stringsAsFactors = FALSE)
    dRemote %>% 
      left_join(., mapf, by = "gender", copy=TRUE) %>%
      select(., -gender) %>%
      rename(., gender = gender_mapped)
    # # Source:   lazy query [?? x 2]
    # # Database: sqlite 3.19.3 [:memory:]
    #   name               gender
    #   <chr>              <chr> 
    # 1 Luke Skywalker     M     
    # 2 C-3PO              NA    
    # 3 R2-D2              NA    
    # 4 Darth Vader        M     
    # 5 Leia Organa        F     
    # 6 Owen Lars          M     
    # 7 Beru Whitesun lars F     
    # 8 R5-D4              NA    
    # 9 Biggs Darklighter  M     
    # 10 Obi-Wan Kenobi     M     
    # # ... with more rows
    
    
    # cdata mapping (doesn't require dplyr)
    mapf <- data.frame(gender = names(map),
                       gender_mapped = as.character(map),
                       stringsAsFactors = FALSE)
    DBI::dbWriteTable(db, "mapf", mapf)
    cdata::map_fields_q("dRemote", "gender", "mapf", db, "dRes")
    #> [1] "dRes"
    head(DBI::dbGetQuery(db, "SELECT * FROM dRes") )
    #             name gender gender_mapped
    # 1 Luke Skywalker   male             M
    # 2          C-3PO   <NA>          <NA>
    # 3          R2-D2   <NA>          <NA>
    # 4    Darth Vader   male             M
    # 5    Leia Organa female             F
    # 6      Owen Lars   male             M
    
    DBI::dbDisconnect(db)
    
  2. If I get this clear, I have to remap all values, is this correct? In my case I often want to remap just a few things and want to leave the other values unchanged… This will not be possible with your way of remapping. Am i correct?

    1. That can be done with an ifelse().

      library("dplyr")
      library("wrapr")
      
      map <- qc(female = F, male = M)
      
      . <- starwars
      .$gender <- ifelse(.$gender %in% names(map), 
                         map[.$gender],
                         .$gender)
      res <- .
      
      # head(res[, qc(name, gender)])
      table(res$gender, useNA = 'ifany')
      #   F hermaphrodite             M          none          <NA> 
      #  19             1            62             2             3 
      

Leave a Reply