The following are considerations when deciding which approach to take when transferring data between Excel and R:
Here are some R packages and approaches for transferring data between Excel spreadsheets and :
Windows only. Excel must be installed:
Windows only. Excel not needed:
library(xlsReadWrite) DF1 <- read.xls("test.xls") # read 1st sheet DF2 <- read.xls("test.xls", sheet = 2) # read 2nd sheet test1 <- read.xls("test.xls", sheet = "test1") # read sheet test1
Windows/Mac/Linux. Excel not needed:
options( java.parameters = "-Xmx4g" )
and use this command after any operation that uses a large amount of memory:
This is discussed further here: https://stat.ethz.ch/pipermail/r-help/attachments/20130611/9386269b/attachment.pl
library(dataframes2xls) df1 <- data.frame(c1=1:2, c2=3:4, c3=5:6) df2 <- data.frame(c21=c(10.10101010101,20, 3), c22=c(50E50,60, 3) ) outFile <- 'df12.xls' write.xls(c(df1,df2), outFile)
library(gdata) # read sheet off net ignoring rows prior to the first row having a cell containing the word State crime.url <- "http://www.jrsainfo.org/jabg/state_data2/Tribal_Data00.xls" crime <- read.xls(crime.url, pattern = "State")
Some caveats when using read.xls in gdata: (1) read.xls uses a perl program that produces an intermediate file with comma separated values (csv) and with quoted fields. The perl program escapes the quotes in the input data with backslashes by default. It then reads that intermediate file with read.table. Unfortunately, read.table does not understand backslash escaped quotes as representing quotes and just interprets these as a backslash followed by a quote. Thus if your data values contain quotes use read.xls(..., quote = ““) and fix up the data in R. If your data values contain quotes and commas but no tabs then use read.xls(..., quote = ““, method = “tab”). If these approaches do not work use one of the additional functions xls2csv, etc. mentioned previously and read in the intermediate file yourself. (2) xls files are read using the formatted precision (if they have been formatted in Excel) whereas xlsx files are read using the full underlying precision. (3) Excel sheets with accounting format will produce data with parentheses in them that will have to be removed in R and then converted to numeric. (4) read.xls passes the path to perl which does not do ~ expansion so if your path includes a tilde to denote the home directory use path.expand explicitly: read.xls(path.expand(“~/myfile.csv”)). (5) At one time Rtools included perl and since many R users have Rtools installed they had perl too but more recent versions of Rtools do not include perl so Windows users may need to install perl (its already present on most UNIX systems). ActivePerl is an automated installer for Perl which is very easy to use. (6) Be sure to use at least gdata version 2.12.0 as some bugs were fixed in that version.
library(RODBC) # the comments below relate to RODBC used with the Excel 2003 ODBC driver on Windows Vista con <- odbcConnectExcel("test.xls") # list sheet names and other info in alphabetical order -- NOT order that the sheets appear in the workbook sqlTables(con) DF <- sqlFetch(con, "test1") # get sheet called test1 # read named range MyData MyData <- sqlQuery(con, "select * from MyData", na.strings = "NA", as.is = TRUE) close(con)
library(WriteXLS) df1 <- data.frame(c1=1:2, c2=3:4, c3=5:6) df2 <- data.frame(c21=c(10.10101010101,20, 3), c22=c(50E50,60, 3) ) outFile <- 'df12.xls' write.xls(c(df1,df2), outFile) # another example iris.split <- split(iris, iris$Species) WriteXLS("iris.split", "iris_split.xls")
library(xlsx) # read sheets names(getSheets(loadWorkbook("test.xlsx"))) # list sheet names DF <- read.xlsx("test.xlsx", 1) # read first sheet test1 <- read.xlsx("test.xlsx", sheetName = "test1") # read sheet named test1 # write sheets (based on post by Don MacQueen on r-help) df1 <- data.frame(c1 = 1:2, c2 = 3:4, c3 = 5:6) df2 <- data.frame(c21 = c(10.10101010101,20, 3), c22 = c(50E50, 60, 3) ) outFile <- 'df12.xls' wb <- createWorkbook() sh1 <- createSheet(wb,'sheet1') addDataFrame(df1,sh1) sh2 <- createSheet(wb,'sheet2') addDataFrame(df2,sh2) saveWorkbook(wb,outFile)
Much of the remaining info on this page is outdated and probably should be deleted.
Read import_table to get some ideas about how to bring data into R from a text file.
Most commonly, people seem to want to import Microsoft Excel spreadsheets. Be sure to prepare your data in Excel so that the names of the variables are at the top of each column of data, and you have numbers or NA filled in for all cells (although this last part is not always necessary as noted in the some of the examples below.)
Perhaps the quickest way to import a ‘small’ amount of data from almost any Windows application (MS Excel spreadsheet, MS Access database query or table or even a delimited text file) is to select the text (including column headings) or the rows (in MS Access table or query) with the mouse and copy it to the clipboard (ctrl-c). Then type the following command at the prompt:
myDF <- read.delim("clipboard")
Your data are now saved in an object called myDF. Inspect your data before using. The following example demonstrates shows how to go the other direction – how to get a ‘small’ amount of data out of into Excel.
## export 'iris' data to clipboard write.table(iris, "clipboard", sep = "\t", col.names = NA) ## then open up MS Excel and paste (ctrl-v) in iris data
1) Date values may not work as expected using the above approaches.
2) I don’t know what the size limit is for the Windows clipboard but be aware that there is a limit to the amount of data the clipboard can hold. However, the above methods work relatively well for ‘small’ data sets that have a few hundred cases or less.
3)Will probably not work if running the commands from an editor such as R-editor or Tinn-R. Those programs use the clipboard to carry their command from the editor to the R console. That temporarily displaces the Excel (or any other program’s table) data that had just been copied.
Often times a single row or column vector of data needs to be imported into to perform simple calculations (like those you would normally do in a spreadsheet), to graph, or to use as input to a function. What follows are some examples of how to get data from Excel into for these purposes.
(Your spreadsheet might look like this.)
|col B||col C||col D.|
|row 1||x <- scan()||y <- scan(, what=”“)|
With your mouse select from
row 1, col B to
row 7, col B in your spreadsheet (be sure to include the blank cell in row 7) and paste (Ctrl-V) into . Now you have an object in called ‘x’ with the values 1, 2, 3, 4, and 23. Now you can use ‘x’ for whatever purpose you were planning.
The above methods work fine when you have a few hundred cases and limited number of columns. When you data set has grown beyond those limits though, there are better and safer methods for getting your data into using spreadsheets and databases.
For reading data from Microsoft Access, see microsoft_access.
Some of these methods are described below for Excel, but recall that Excel has a limit on the size of the worksheet. The maximum worksheet size for Excel 2000 is 65,536 rows by 256 columns. The maximum worksheet size for Excel 12 (expected release in 2007) will be 1,048,576 rows by 16,384 columns. If your data exceed Excel’s limits, you may need to use Access or other relational database applications.
I have not tested the following approach in applications other than Excel and Access, but I think these can be modified and used for non-MS applications.
The safest approach is to define a named range in Excel (2000) by selecting Name » Define from the Insert menu. “Name” & “Define” the range of data using the dialog box. Save your Excel workbook. Let’s say I Named my range of data by calling it “MyData” and saved the Excel file as “Test.xls”. Use the following code to read the data into using the RODBC package.
library(RODBC) MyExcelData <- sqlQuery(odbcConnectExcel("Test.xls"), "select * from MyData", na.strings = "NA", as.is = T) odbcCloseAll()
Use the following code to import in all of worksheet called “Sheet 1”. The hazard with this approach is that any and all data in that worksheet will be copied in, this includes data that are hidden or that you otherwise were not intending to bring in.
library(RODBC) MyExcelData <- sqlFetch(odbcConnectExcel("Test.xls"), sqtable = "Sheet1", na.strings = "NA", as.is = T) odbcCloseAll()
Excel 2003 (and earlier?) use the first 0-16 rows to guess the data type. Consider a column of international postal codes where the first 20 rows contain 50010 and the next two rows contain 500A1 and 500E1. The value of ‘500A1’ is likely to be interpreted as a missing value and the value of ‘500E1’ may be interpreted as a numeric value that is in exponential format. More information can be found here: http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/.
— Nick Drew 2006/04/19 07:48
There are several alternatives to read xls (Excel 97 through 2003) or xlsx (Excel 2007) files directly:
library( xlsReadWrite ) ### create some test^H^H^H^Hbikedata tdat <- data.frame( Price = c( 6399, 3699, 2499 ), Amount = c( 2, 3, 1 ), Date = c( 39202, 39198, 39199 ), row.names = c( "Pro machine", "Road racer", "Streetfire" ) ) ### write write.xls( tdat, "bikes.xls" ) ### read and check # read as data.frame bikes1 <- read.xls( file = "bikes.xls" ) if (!identical( tdat, bikes1 )) stop( "oops, not good..." ) # read as data.frame (custom colnames, date as iso-string) bikes2 <- read.xls( file = "bikes.xls", colNames = c( "", "CHF", "Number", "Date" ), from = 2, colClasses = c( "numeric", "numeric", "isodate" ) ) if (!all( tdat$Date == isoStrToDateTime( bikes2$Date ) )) stop( "oops, not good..." ) # read as matrix bikes3 <- read.xls( file = "bikes.xls", type = "double" ) if (!identical( as.matrix( tdat ), bikes3 )) stop( "oops, not good..." )
xlsReadWrite has some non-standard aspects, hence consider the following remarks:
xlsReadWrite has the same problems reading columns of mixed-data as mentioned in the “Caution” section above. Type guessing for data.frame variables works like this: max. 16 rows will be considered and the first non-empty cell value will determine the type. Example: a numeric value in the 1st row determines the type (numeric). Now a string value in the 2nd row which cannot be converted to a number will be given back as a NA.
Solution: specify a colClasses argument and explicitly decide if you want numbers or characters. [In the pro version you can also read (an excerpt of) a single column and check the needed type for yourself. Note: the above example would work well with the pro version as the guessing algorithm considers all 16 rows (but it would fail also if the character value were on row 17 or more...)].