| old_column |
|---|
| Here |
| is |
| a |
| column |
I haven’t posted for a while, and came across a tweet from Angie Jones that I really related to.
Not that my previous posts were intellectual thinkpieces, but I thought that I had to write about something novel or innovative to provide any level of value.
When I first starting using R, my code was a mash-up of base R, dplyr, and data.table. I would reference a column by index and then by name. It was hard for me to follow, and I cringe at the idea that I sent some of this old code to colleagues.
I was trying to think of how many ways there are to do simple data cleaning tasks in R, and thought it would be fun to explore.
The only task accomplished in the rest of this post will be renaming a column, and some pics of my cats.
- Original column name:
old_column - Renamed column name:
new_column
Every example will include a data.frame that is called df and will contain one column named old_column that we will rename as new_column:
Using Base R
The following examples will only use base R, meaning no additional packages will be required to run this code.
- Call
colnamesondfand index the first column.
colnames(df)[1] <- "new_column"- Call
namesondfand index the first column.
names(df)[1] <- "new_column"- Call
colnamesondfand subset the first column also usingcolnames.
colnames(df)[colnames(df) == "old_column"] <- "new_column"- Call
namesondfand subset the first column also usingnames.
names(df)[names(df) == "old_column"] <- "new_column"- Call
colnamesondfand subset the first column usingnames.
colnames(df)[names(df) == "old_column"] <- "new_column"- Call
namesondfand subset the first column usingcolnames.
names(df)[colnames(df) == "old_column"] <- 'new_column'- Call
colnamesondfand subset using logical indexingwhich. This returns the index of the column that is equal to “old_column”.
colnames(df)[which(colnames(df) == "old_column")] <- "new_column"- Since
dfonly has one column, we can also callnamesondf:
names(df) <- "new_column"- …or
colnamesondf:
colnames(df) <- "new_column"- We can also use a different, and less efficient approach. Instead of renaming the column value, we can create a new column that is identical to
old_columnand name itnew_column. Then we can removeold_columnfrom ourdf:
# Create a new column called "new_column" that is an exact copy of "old_column"
df$new_column <- df$old_column
# Remove "old_column"
df$old_column <- NULL- Getting a bit more abstract, we can use
colnameswithgreplto useregexpattern matching:
colnames(df)[grepl("old", colnames(df))] <- "new_column"- …we can also use
nameswith #11:
names(df)[grepl("old", names(df))] <- "new_column"- We can swap the first
nameswithcolnames:
colnames(df)[grepl("old", names(df))] <- "new_column"- Flip it and reverse it…

names(df)[grepl("old", colnames(df))] <- "new_column"- Using
grep+names:
names(df)[grep("old", names(df))] <- "new_column"- Using
grep+colnames:
colnames(df)[grep("old", colnames(df))] <- "new_column"- Using
grep+namesthencolnames:
names(df)[grep("old", colnames(df))] <- "new_column"- Using
grep+colnamesthennames:
- (I am intentionally stopping myself from more Missy Elliott references.)
colnames(df)[grep("old", names(df))] <- "new_column"- Using
sub+colnames:
colnames(df) <- sub("old_column", "new_column", colnames(df))- Using
sub+names:
names(df) <- sub("old_column", "new_column", names(df))- Using
sub+namesthencolnames:
names(df) <- sub("old_column", "new_column", colnames(df))- Using
sub+colnamesthennames:
colnames(df) <- sub("old_column", "new_column", names(df))- Using
gsub+colnames:
colnames(df) <- gsub("old_column", "new_column", colnames(df))- Using
gsub+names:
names(df) <- gsub("old_column", "new_column", names(df))- Using
gsub+namesthencolnames:
names(df) <- gsub("old_column", "new_column", colnames(df))- Using
gsub+colnamesthennames:
colnames(df) <- gsub("old_column", "new_column", names(df))- Using a
for loopwithcolnames:
for (i in paste0("new_column")){
colnames(df) <- i
}- Using a
for loopwithnames:
for (i in paste0("new_column")){
names(df) <- i
}- Using
setNames:
df <- setNames(df, "new_column")- Using
evalandparsewithnames:
eval(parse(text = 'names(df) <- "new_column"'))- Using
evalandparsewithcolnames:
eval(parse(text = 'colnames(df) <- "new_column"'))- Using
setNamesandreplace:
setNames(df, replace(names(df), names(df) == 'old_column', 'new_column'))- Using
transform:
df <- transform(df, new_column = old_column, old_column = NULL)tidyverse
You can learn more about the tidyverse here
- Using
renamewithout a%>%:
df <- rename(df, "new_column" = "old_column")- Using
renamewith a%>%:
df <- df %>%
rename("new_column" = "old_column")- Renaming in a
selectcall without a%>%:
df <- select(df, "new_column" = "old_column")- Renaming in a
selectcall with a%>%:
df <- df %>%
select("new_column" = "old_column")- Using
mutateto create a new column and then removing theold_column:
df <- df %>%
mutate(new_column = old_column) %>%
select(-old_column)- Using
mutateto create a new column and then removing theold_columnwithout pipes (%>%):
df <- mutate(df, new_column = old_column)
df$old_column <- NULL- Using
purrr+setnamesandstr_replace_*:
df <- df %>%
set_names(~(.) %>%
str_replace_all("old_column", "new_column"))- Using a character vector and
rename:
rename_vec <- c("new_column" = "old_column")
df <- df %>%
rename(rename_vec)- Using
str_replace+names:
names(df) <- str_replace(names(df), "old_column", "new_column")- Using
str_replace+colnames:
colnames(df) <- str_replace(colnames(df), "old_column", "new_column")- Using
starts_with:
df <- df %>%
select("new_column" = starts_with("old"))- Using
ends_with:
df <- df %>%
select("new_column" = ends_with("column"))- Using
rename_with+gsub:
df <- df %>%
rename_with(~gsub("old_", "new_", .x))- Using
rename_with+sub:
df <- df %>%
rename_with(~sub("old_", "new_", .x))- Using
rename_withandstr_replace:
df <- df %>%
rename_with(~str_replace("new_column", "old_column", .x))Renamewith an index:
df <- df %>%
rename("new_column" = 1)A note: I’m going to stop interchanging names and colnames as I did previously. I didn’t have any idea how many ways there would be to rename columns when I started this, but it’s becoming evident that there are likely hundreds of ways if we count every nuance.
I’m also throwing in the towel on the deprecated/superseded rename_at / rename_if / rename_all functions, since they have been replaced by select and rename_with.
data.table
data.table is really fast, and you can… do cool stuff with it. I am a data.table n00b. You can learn more about data.table here.
- Using
data.table::setnames:
df <- as.data.table(df, keep.rownames = FALSE)
setnames(df, "old_column", "new_column")- Using
data.table::setnameswith an index:
df <- as.data.table(df, keep.rownames = FALSE)
setnames(df, 1, "new_column")- Refactoring the previous
data.tableexample (I have no idea what I’m doing 😅)
as.data.table(df)[, .(new_column = old_column)]What’s in a (re)name?
R is an amazing language and there are endless things you can do. Coming from SPSS, I was previously familiar with rename and just left it at that. I had some grand ideas of microbenchmarking each of these methods to find the fastest renaming solution, and maybe that will happen someday if I get an espresso machine or something. ☕
Our team at work will be transitioning from SPSS to R, and this has given me a lot to think about, specifically about the importance of having standardized code, but also having some built-in flexibility for each person’s coding style. I’m looking forward to another version of this post, where I focus on a task that is slightly more complicated. Maybe iterating through a data.frame column/rowwise?
I also acknowledge my severe lack of data.table knowledge. I don’t work with big data, and am not in a position to need to make production-level code performant. tidyverse code is way more intuitive for me, and the community is really supportive and engaged, so I will likely leave data.table off the …table for a while.
… I’ll see myself out.
Cats
References
- https://stackoverflow.com/questions/7531868/how-to-rename-a-single-column-in-a-data-frame
- https://stackoverflow.com/questions/35084427/how-to-change-column-names-in-dataframe-in-the-loop
- https://stackoverflow.com/questions/50687741/how-to-rename-column-headers-in-r
- https://stackoverflow.com/questions/46616591/rename-multiple-dataframe-columns-using-purrr
- https://stackoverflow.com/questions/20987295/rename-multiple-columns-by-names
- https://stackoverflow.com/questions/9283171/rename-multiple-dataframe-columns-referenced-by-current-names/9292258
- https://stackoverflow.com/questions/53168572/how-to-rename-specific-variable-of-a-data-frame-with-setnames