Removing rows from a data frame

Thomas Wutzler 2006/09/14

:N:

Using the built-in dataset airquality, first load the data and check what variables it contains:

data(airquality)
names(airquality)

Among ways to subset rows of a dataframe in S language, there are two usual approaches. Either you delete rows by extracting all other rows of the dataframe using a vector of logical values, or you remove these rows using a vector of negative indices.

Subset by logical indices

Assume that days 5 and 7 in May of the airquality measurements are outliers and you want to repeat an analysis without these rows. You write:

length(airquality$Day)
airquality2 <- subset(airquality, !(Day %in% c(5, 7) & Month == 5))
length(airquality2$Day)

Subset by negative indices

Similarly, you can delete specific rows. In order to delete lines 2 and 7, you write:

length(airquality$Day)
airquality3 <- airquality[-c(2, 7), ] 
length(airquality3$Day)

Traps for beginners

Claudia Beleites 2008/01/02

Be careful with logical versus numeric index vectors:

new.data <- data[!outliers, ]  # logical indices
new.data <- data[-outliers, ]  # numeric indices

Using the numeric form for a logical index vector will delete the first row only:

 > data <- 1:10
 > outliers <- data %in% 3:7
 > new.data <- data[!outliers]  # (desired effect)
 > new.data
 [1]  1  2  8  9 10 
 > new.data <- data[-outliers]  # (wrong code)
 > new.data
 [1]  2  3  4  5  6  7  8  9 10
 
tips/data-frames/remove_rows_data_frame.txt · Last modified: 2008/01/05
 
Recent changes RSS feed R Wiki powered by Driven by DokuWiki and optimized for Firefox Creative Commons License