Update: Since R 4.0.0
, the default for stringAsFactors
has been set to FALSE
. It’s a good idea to be explicit, so that the code can run more robustly across different R versions.
One of the default behaviors in older versions of R, which is sometimes too aggressive, is converting strings to a datatype called factor
.
t1 <- data.frame(label = rep(c("A", "B"), 5), stringsAsFactors = TRUE)
t1$label
## [1] A B A B A B A B A B
## Levels: A B
If a new string value is entered:
t1$label[1] <- "C"
## Warning in `[<-.factor`(`*tmp*`, 1, value = structure(c(NA, 2L, 1L, 2L, :
## invalid factor level, NA generated
R won’t recognize the string and instead generated a missing value. To resolve this issue:
t2 <- data.frame(label = rep(c("A", "B"), 5), stringsAsFactors = FALSE)
t2
## label
## 1 A
## 2 B
## 3 A
## 4 B
## 5 A
## 6 B
## 7 A
## 8 B
## 9 A
## 10 B
t2$label[1] <- "C"
t2
## label
## 1 C
## 2 B
## 3 A
## 4 B
## 5 A
## 6 B
## 7 A
## 8 B
## 9 A
## 10 B
The column can be transformed into factor
later if necessary:
t2 <- transform(t2, label2 = factor(label))
t2$label2
## [1] C B A B A B A B A B
## Levels: A B C
See also: R-blogger post