Thomas Wutzler 2006/09/14

Sorting a dataframe by one column

You can sort a dataframe using subsetting and the order function.

Using the built-in dataset airquality, first load the data and check what variables it contains:

data(airquality)
names(airquality)

To sort the airquality-dataframe by ascending temperature you write:

airquality2 <- airquality[order(airquality$Temp), ]

You can use additional argmuments to the order function to do decending sorting, or finetude dealing with missing Values (NA).

airquality3 <- airquality[order(airquality$Temp,
    decreasing = TRUE, na.last = TRUE), ]

Sorting a dataframe by multiple columns

Sorting a dataframe by multiple columns can also be accomplished using the order() function:

 > dd <- data.frame(b = factor(c("Hi", "Med", "Hi", "Low"), 
 +    levels = c("Low", "Med", "Hi"), ordered = TRUE),
 +    x = c("A", "D", "A", "C"), y = c(8, 3, 9, 9),
 +    z = c(1, 1, 1, 2))
 > with(dd, dd[order(z, x), ])
    b x y z
1  Hi A 8 1
3  Hi A 9 1
2 Med D 3 1
4 Low C 9 2

For descending sorts, use ‘-’ for numeric and ‘-xtfrm(x)’ for factors/character:

 > with(dd, dd[order(-z, -xtfrm(x)), ])
    b x y z
4 Low C 9 2
2 Med D 3 1
1  Hi A 8 1
3  Hi A 9 1

Thereby ‘xtfrm’ is a built-in auxiliary function of R returning a numeric vector representing the order of x.

Note that the sometimes seen idea to use ‘rev(x)’ to sort in descending order, is wrong. Consider the following counterexample:

 > dd <- as.data.frame(list(x=c('b','a','c')))
 > with(dd, dd[order(rev(x)), ])
[1] a c b

Technical notes: The reason for this is, that rev leads to a “correct ascending sorting” followed by a “reversed assignment of elements”, caused by the ‘rev’ command. Thus elements “c” and “b” are swapped, as the mapping 1↔3 is caused by ‘rev’.

A unified method for sorting data frames with mixed data types is provided by the following function by Kevin Wright.

sort.data.frame <- function(x, by){
    # Author: Kevin Wright
    # with some ideas from Andy Liaw
    # http://tolstoy.newcastle.edu.au/R/help/04/07/1076.html
 
    # x: A data.frame
    # by: A one-sided formula using + for ascending and - for descending
    #     Sorting is left to right in the formula
  
    # Useage is:
    # library(nlme);
    # data(Oats)
    # sort(Oats, by= ~nitro-Variety)
 
    if(by[[1]] != "~")
        stop("Argument 'by' must be a one-sided formula.")
 
    # Make the formula into character and remove spaces
    formc <- as.character(by[2]) 
    formc <- gsub(" ", "", formc) 
    # If the first character is not + or -, add +
    if(!is.element(substring(formc, 1, 1), c("+", "-")))
        formc <- paste("+", formc, sep = "")
 
    # Extract the variables from the formula
    vars <- unlist(strsplit(formc, "[\\+\\-]"))    
    vars <- vars[vars != ""] # Remove any extra "" terms
 
    # Build a list of arguments to pass to "order" function
    calllist <- list()
    pos <- 1 # Position of + or -
    for(i in 1:length(vars)){
        varsign <- substring(formc, pos, pos)
        pos <- pos + 1 + nchar(vars[i])
        if(is.factor(x[, vars[i]])){
            if(varsign == "-") {
                calllist[[i]] <- -rank(x[, vars[i]])
            } else {
                calllist[[i]] <- rank(x[, vars[i]])
            }
        } else {
            if(varsign == "-") {
                calllist[[i]] <- -x[, vars[i]]
            } else {
                calllist[[i]] <- x[,vars[i]]
            }
        }
    }
    return(x[do.call("order", calllist), ])
}

To sort descending by z and ascending by b:

 > sort(dd, by = ~ -z + b)
    b x y z
4 Low C 9 2
2 Med D 3 1
1  Hi A 8 1
3  Hi A 9 1
 
tips/data-frames/sort.txt · Last modified: 2013/04/22 by patrick.roocks
 
Recent changes RSS feed R Wiki powered by Driven by DokuWiki and optimized for Firefox Creative Commons License