I haven’t posted for a while, and came across a tweet from Angie Jones that I really related to.

Not that my previous posts were intellectual thinkpieces, but I thought that I had to write about something novel or innovative to provide any level of value.

When I first starting using R, my code was a mash-up of base R, dplyr, and data.table. I would reference a column by index and then by name. It was hard for me to follow, and I cringe at the idea that I sent some of this old code to colleagues.

I was trying to think of how many ways there are to do simple data cleaning tasks in R, and thought it would be fun to explore.

The only task accomplished in the rest of this post will be renaming a column, and some pics of my cats.

Original column name: old_column
Renamed column name: new_column

Every example will include a data.frame that is called df and will contain one column named old_column that we will rename as new_column:

old_column
Here
is
a
column

Using `Base R`

The following examples will only use base R, meaning no additional packages will be required to run this code.

Call colnames on df and index the first column.

colnames(df)[1] <- "new_column"

Call names on df and index the first column.

names(df)[1] <- "new_column"

Call colnames on df and subset the first column also using colnames.

colnames(df)[colnames(df) == "old_column"] <- "new_column"

Call names on df and subset the first column also using names.

names(df)[names(df) == "old_column"] <- "new_column"

Call colnames on df and subset the first column using names.

colnames(df)[names(df) == "old_column"] <- "new_column"

Call names on df and subset the first column using colnames.

names(df)[colnames(df) == "old_column"] <- 'new_column'

Call colnames on df and subset using logical indexing which. This returns the index of the column that is equal to “old_column”.

colnames(df)[which(colnames(df) == "old_column")] <- "new_column"

Since df only has one column, we can also call names on df:

names(df) <- "new_column"

…or colnames on df:

colnames(df) <- "new_column"

We can also use a different, and less efficient approach. Instead of renaming the column value, we can create a new column that is identical to old_column and name it new_column. Then we can remove old_column from our df:

# Create a new column called "new_column" that is an exact copy of "old_column"
df$new_column <- df$old_column

# Remove "old_column"
df$old_column <- NULL

Getting a bit more abstract, we can use colnames with grepl to use regex pattern matching:

colnames(df)[grepl("old", colnames(df))] <- "new_column"

…we can also use names with #11:

names(df)[grepl("old", names(df))] <- "new_column"

We can swap the first names with colnames:

colnames(df)[grepl("old", names(df))] <- "new_column"

Flip it and reverse it…

names(df)[grepl("old", colnames(df))] <- "new_column"

Using grep + names:

names(df)[grep("old", names(df))] <- "new_column"

Using grep + colnames:

colnames(df)[grep("old", colnames(df))] <- "new_column"

Using grep + names then colnames:

names(df)[grep("old", colnames(df))] <- "new_column"

Using grep + colnames then names:

(I am intentionally stopping myself from more Missy Elliott references.)

colnames(df)[grep("old", names(df))] <- "new_column"

Using sub + colnames:

colnames(df) <- sub("old_column", "new_column", colnames(df))

Using sub + names:

names(df) <- sub("old_column", "new_column", names(df))

Using sub + names then colnames:

names(df) <- sub("old_column", "new_column", colnames(df))

Using sub + colnames then names:

colnames(df) <- sub("old_column", "new_column", names(df))

Using gsub + colnames:

colnames(df) <- gsub("old_column", "new_column", colnames(df))

Using gsub + names:

names(df) <- gsub("old_column", "new_column", names(df))

Using gsub + names then colnames:

names(df) <- gsub("old_column", "new_column", colnames(df))

Using gsub + colnames then names:

colnames(df) <- gsub("old_column", "new_column", names(df))

Using a for loop with colnames:

for (i in paste0("new_column")){
  colnames(df) <- i
}

Using a for loop with names:

for (i in paste0("new_column")){
  names(df) <- i
}

Using setNames:

df <- setNames(df, "new_column")

Using eval and parse with names:

eval(parse(text = 'names(df) <- "new_column"'))

Using eval and parse with colnames:

eval(parse(text = 'colnames(df) <- "new_column"'))

Using setNames and replace:

setNames(df, replace(names(df), names(df) == 'old_column', 'new_column'))

Using transform:

df <- transform(df, new_column = old_column, old_column = NULL)

tidyverse

You can learn more about the tidyverse here

Using rename without a %>%:

df <- rename(df, "new_column" = "old_column")

Using rename with a %>%:

df <- df %>% 
  rename("new_column" = "old_column")

Renaming in a select call without a %>%:

df <- select(df, "new_column" = "old_column")

Renaming in a select call with a %>%:

df <- df %>% 
  select("new_column" = "old_column")

Using mutate to create a new column and then removing the old_column:

df <- df %>% 
  mutate(new_column = old_column) %>% 
  select(-old_column)

Using mutate to create a new column and then removing the old_column without pipes (%>%):

df <- mutate(df, new_column = old_column)
df$old_column <- NULL

Using purrr + setnames and str_replace_*:

df <- df %>%
    set_names(~(.) %>%
                  str_replace_all("old_column", "new_column"))

Using a character vector and rename:

rename_vec <- c("new_column" = "old_column")

df <- df %>% 
  rename(rename_vec)

Using str_replace + names:

names(df) <- str_replace(names(df), "old_column", "new_column")

Using str_replace + colnames:

colnames(df) <- str_replace(colnames(df), "old_column", "new_column")

Using starts_with:

df <- df %>% 
  select("new_column" = starts_with("old"))

Using ends_with:

df <- df %>% 
  select("new_column" = ends_with("column"))

Using rename_with + gsub:

df <- df %>% 
  rename_with(~gsub("old_", "new_", .x))

Using rename_with + sub:

df <- df %>% 
  rename_with(~sub("old_", "new_", .x))

Using rename_with and str_replace:

df <- df %>% 
     rename_with(~str_replace("new_column", "old_column", .x))

Rename with an index:

df <- df %>% 
     rename("new_column" = 1)

A note: I’m going to stop interchanging names and colnames as I did previously. I didn’t have any idea how many ways there would be to rename columns when I started this, but it’s becoming evident that there are likely hundreds of ways if we count every nuance.

I’m also throwing in the towel on the deprecated/superseded rename_at / rename_if / rename_all functions, since they have been replaced by select and rename_with.

data.table

data.table is really fast, and you can… do cool stuff with it. I am a data.table n00b. You can learn more about data.table here.

Using data.table::setnames:

df <- as.data.table(df, keep.rownames = FALSE)
setnames(df, "old_column", "new_column")

Using data.table::setnames with an index:

df <- as.data.table(df, keep.rownames = FALSE)
setnames(df, 1, "new_column")

Refactoring the previous data.table example (I have no idea what I’m doing 😅)

as.data.table(df)[, .(new_column = old_column)]

What’s in a (re)name?

R is an amazing language and there are endless things you can do. Coming from SPSS, I was previously familiar with rename and just left it at that. I had some grand ideas of microbenchmarking each of these methods to find the fastest renaming solution, and maybe that will happen someday if I get an espresso machine or something. ☕

Our team at work will be transitioning from SPSS to R, and this has given me a lot to think about, specifically about the importance of having standardized code, but also having some built-in flexibility for each person’s coding style. I’m looking forward to another version of this post, where I focus on a task that is slightly more complicated. Maybe iterating through a data.frame column/rowwise?

I also acknowledge my severe lack of data.table knowledge. I don’t work with big data, and am not in a position to need to make production-level code performant. tidyverse code is way more intuitive for me, and the community is really supportive and engaged, so I will likely leave data.table off the …table for a while.

… I’ll see myself out.

52 Different Ways to Rename a Column in R

Using `Base R`

tidyverse

data.table

What’s in a (re)name?

Cats

References

Using Base R

tidyverse

data.table

What’s in a (re)name?

Cats

References

Using `Base R`