Table of Contents

— *Thomas Wutzler 2006/09/14*

Using the built-in dataset `airquality`

, first load the data and check what variables it contains:

data(airquality) names(airquality)

Among ways to subset rows of a dataframe in S language, there are two usual approaches. Either you delete rows by extracting all other rows of the dataframe using a vector of logical values, or you remove these rows using a vector of negative indices.

Assume that days 5 and 7 in May of the `airquality`

measurements are outliers and you want to repeat an analysis without these rows. You write:

length(airquality$Day) airquality2 <- subset(airquality, !(Day %in% c(5, 7) & Month == 5)) length(airquality2$Day)

Similarly, you can delete specific rows. In order to delete lines 2 and 7, you write:

length(airquality$Day) airquality3 <- airquality[-c(2, 7), ] length(airquality3$Day)

— *Claudia Beleites 2008/01/02*

Be careful with logical *versus* numeric index vectors:

new.data <- data[!outliers, ] # logical indices new.data <- data[-outliers, ] # numeric indices

Using the numeric form for a logical index vector will delete the first row only:

> data <- 1:10 > outliers <- data %in% 3:7 > new.data <- data[!outliers] # (desired effect) > new.data [1] 1 2 8 9 10 > new.data <- data[-outliers] # (wrong code) > new.data [1] 2 3 4 5 6 7 8 9 10