— Thomas Wutzler 2006/09/14
You can sort a dataframe using subsetting and the order function.
Using the built-in dataset airquality, first load the data and check what variables it contains:
data(airquality) names(airquality)
To sort the airquality-dataframe by ascending temperature you write:
airquality2 <- airquality[order(airquality$Temp), ]
You can use additional argmuments to the order function to do decending sorting, or finetude dealing with missing Values (NA).
airquality3 <- airquality[order(airquality$Temp, decreasing = TRUE, na.last = TRUE), ]
Sorting a dataframe by multiple columns can also be accomplished using the order() function:
> dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), + levels = c("Low", "Med", "Hi"), ordered = TRUE), + x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9), + z = c(1, 1, 1, 2)) > with(dd, dd[order(z, x), ]) b x y z 1 Hi A 8 1 3 Hi A 9 1 2 Med D 3 1 4 Low C 9 2
For descending sorts, use ‘-’ for numeric and ‘-xtfrm(x)’ for factors/character:
> with(dd, dd[order(-z, -xtfrm(x)), ]) b x y z 4 Low C 9 2 2 Med D 3 1 1 Hi A 8 1 3 Hi A 9 1
Thereby ‘xtfrm’ is a built-in auxiliary function of R returning a numeric vector representing the order of x.
Note that the sometimes seen idea to use ‘rev(x)’ to sort in descending order, is wrong. Consider the following counterexample:
> dd <- as.data.frame(list(x=c('b','a','c'))) > with(dd, dd[order(rev(x)), ]) [1] a c b
Technical notes: The reason for this is, that rev leads to a “correct ascending sorting” followed by a “reversed assignment of elements”, caused by the ‘rev’ command. Thus elements “c” and “b” are swapped, as the mapping 1↔3 is caused by ‘rev’.
A unified method for sorting data frames with mixed data types is provided by the following function by Kevin Wright.
sort.data.frame <- function(x, by){ # Author: Kevin Wright # with some ideas from Andy Liaw # http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html # x: A data.frame # by: A one-sided formula using + for ascending and - for descending # Sorting is left to right in the formula # Useage is: # library(nlme); # data(Oats) # sort(Oats, by= ~nitro-Variety) if(by[[1]] != "~") stop("Argument 'by' must be a one-sided formula.") # Make the formula into character and remove spaces formc <- as.character(by[2]) formc <- gsub(" ", "", formc) # If the first character is not + or -, add + if(!is.element(substring(formc, 1, 1), c("+", "-"))) formc <- paste("+", formc, sep = "") # Extract the variables from the formula vars <- unlist(strsplit(formc, "[\\+\\-]")) vars <- vars[vars != ""] # Remove any extra "" terms # Build a list of arguments to pass to "order" function calllist <- list() pos <- 1 # Position of + or - for(i in 1:length(vars)){ varsign <- substring(formc, pos, pos) pos <- pos + 1 + nchar(vars[i]) if(is.factor(x[, vars[i]])){ if(varsign == "-") { calllist[[i]] <- -rank(x[, vars[i]]) } else { calllist[[i]] <- rank(x[, vars[i]]) } } else { if(varsign == "-") { calllist[[i]] <- -x[, vars[i]] } else { calllist[[i]] <- x[,vars[i]] } } } return(x[do.call("order", calllist), ]) }
To sort descending by z and ascending by b:
> sort(dd, by = ~ -z + b) b x y z 4 Low C 9 2 2 Med D 3 1 1 Hi A 8 1 3 Hi A 9 1