— Philippe Grosjean 2006/02/05 [Last tested: R 2.2.1 Win XP]
Timings are done on a Pentium IV 3Ghz machine under Win XP. If you have a slower machine, think to reduce the size of the calculation!
Question:
I want to generate some character strings like (here withn.item = 5, but I want to vary it):
[1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004" [7] "i002.005" "i003.004" "i003.005" "i004.005"
(question adapted from Taka Matzmoto on R-Help, with some code similar to generateIndex1()/generateIndex2() and solutions proposed by John Fox and Jim Holtman)
generateIndex1 <- function(n.item) { Res <- character(0) # initialize vector for (i in 1:(n.item - 1)) { for (j in ((i+1):n.item)) { # concatenate the results Res <- c(Res, paste("i", formatC(i, digits = 2, flag = "0"), ".", formatC(j, digits = 2, flag = "0"), sep = "")) } } Res }
> # Verification > generateIndex1(5) [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004" [7] "i002.005" "i003.004" "i003.005" "i004.005"
generateIndex2 <- function(n.item) { result <- rep("", n.item * (n.item - 1) / 2) index <- 0 for (i in 1:(n.item - 1)) { for (j in ((i + 1):n.item)) { index <- index + 1 result[index] <- paste("i", formatC(i, digits = 2, flag = "0"), ".", formatC(j, digits = 2, flag = "0"), sep = "") } } result }
> # Verification > generateIndex2(5) [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004" [7] "i002.005" "i003.004" "i003.005" "i004.005"
The difference is: generateIndex1() creates an empty character vector and concatenate to it (simplest code), and generateIndex2() creates a vector of empty characters of the correct size [result <- rep(““, n.item * (n.item - 1) / 2)]. The second solution is supposed to be better, because result is supposed to be of the right size, limiting useless memory pagination inside each loop iteration.
However:
> system.time(generateIndex1(100)) [1] 4.86 0.00 4.86 NA NA > system.time(generateIndex2(100)) [1] 4.68 0.00 4.68 NA NA
There is not much difference (well, indeed, the loops and what’s calculated repeatedly inside takes much more time in this case). I wonder what happens if I allocate a vector of the right size with strings having also the right number of characters:
generateIndex3 <- function(n.item) { result <- rep("i000.000", n.item * (n.item - 1) / 2) index <- 0 for (i in 1:(n.item - 1)) { for (j in ((i + 1):n.item)) { index <- index + 1 result[index] <- paste("i", formatC(i, digits = 2, flag = "0"), ".", formatC(j, digits = 2, flag = "0"), sep = "") } } result }
and
> # Verification > generateIndex3(5) [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004 [7] "i002.005" "i003.004" "i003.005" "i004.005" > # Timing > system.time(generateIndex3(100)) [1] 4.63 0.02 4.66 NA NA
About the same.
In the case of string vectors, it does not help much to allocate the correct number of characters, only the length of the vector matters. This is not much obvious in our example, because most of the work is compiling the strings, but look at here:
createVector1 <- function(n.item) { res <- character(0) for (i in 1:n.item) res <- c(res, "bbbbb") res } createVector2 <- function(n.item) { res <- rep("", n.item) for (i in 1:n.item) res[i] <- "bbbbb" res } createVector3 <- function(n.item) { res <- rep("aaaaa", n.item) for (i in 1:n.item) res[i] <- "bbbbb" res }
> # Verification > createVector1(5) [1] "bbbbb" "bbbbb" "bbbbb" "bbbbb" "bbbbb" > createVector2(5) [1] "bbbbb" "bbbbb" "bbbbb" "bbbbb" "bbbbb" > createVector3(5) [1] "bbbbb" "bbbbb" "bbbbb" "bbbbb" "bbbbb" > > # timing > system.time(createVector1(100000)) [1] 68.33 8.51 77.46 NA NA > system.time(createVector2(100000)) [1] 0.28 0.00 0.29 NA NA > system.time(createVector3(100000)) [1] 0.28 0.00 0.28 NA NA
Now, let’s come back to our problem. Where do we waste computing time?
> Rprof() > res <- generateIndex3(100) > Rprof(NULL) > ?summaryRprof > summaryRprof() $by.self self.time self.pct total.time total.pct formatC 0.48 10.5 4.30 93.9 paste 0.46 10.0 4.54 99.1 pmax 0.44 9.6 0.66 14.4 as.integer 0.30 6.6 0.34 7.4 as.logical 0.24 5.2 0.34 7.4 names 0.20 4.4 0.24 5.2 [...]
For sure! Why do I call formatC() twice in each loop? I can increase speed by formatting my character strings only once.
generateIndex4 <- function(n.item) { result <- rep("i000.000", n.item * (n.item - 1) / 2) index <- 0 id <- formatC(1:n.item, digits = 2, flag = "0") for (i in 1:(n.item - 1)) { for (j in ((i + 1):n.item)) { index <- index + 1 result[index] <- paste("i", id[i], ".", id[j], sep = "") } } result }
and
> # Verification > generateIndex4(5) [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004" [7] "i002.005" "i003.004" "i003.005" "i004.005" > > # Timing > system.time(generateIndex4(100)) [1] 0.33 0.00 0.33 NA NA
Yes! That’s much better. Now, recall that it is better to use a vectorized algorithm than loops, could I get rid of these two ugly loops? Here is something using outer() and lower.tri():
generateIndex5 <- function(n.item) { idx <- function(x, y) paste("i", x, ".", y, sep = "") id <- formatC(1:n.item, digits = 2, flag = "0") allidx <- t(outer(id, id, idx)) allidx[lower.tri(allidx)] }
and
> # Verification > generateIndex5(5) [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004" [7] "i002.005" "i003.004" "i003.005" "i004.005" > > # Timing > system.time(generateIndex5(100)) [1] 0.02 0.00 0.02 NA NA
Indeed! That code is much, much faster! Now, let’s compare generateIndex1() with generateIndex5().
generateIndex5() is optimized for speed (4.86/0.02, about 250 times faster!)generateIndex5() is more concise code: 4 lines, no loops, against 8 lines with two loops.generateIndex1() is the code that comes to mind more easily (except, perhaps for some R experts (?) because thinking with vectors is their second nature).generateIndex1() is much easier to understand, when you read the code (for the same reason).Could we do even better?
Solution proposed by Romain François.
Actually, reading the code of the fifth solution, it is quite obvious that it can be speeded up. The matrix allidx is entirely calculated when only the lower triangular part is wanted. One can write R code that calculates only the lower triangular part of the matrix. for instance:
generateIndex5b <- function(n.item) { idx <- function(x, y) ifelse(y > x, paste("i", x, ".", y, sep = ""), "") id <- formatC(1:n.item, digits = 2, flag = "0") allidx <- as.vector(t(outer(id, id, idx))) allidx[allidx != ""] }
However, looking at ifelse(), it appears that the whole matrix is still calculated internally, and then, a selection is made according to the test! Calculation of the lower triangular part is only apparent in the idx() function and consequently, generateIndex5b() is not an improvement over generateIndex5() (it is actually much slower)!
One solution could be to get the hands dirty and write some (simple) C code and use the R calling mechanism to call it.
#include <R.h> void generateIndex6(int * n, char ** out){ int i,j,l; int sizeOf8Chars = sizeof("11111111") ; for(i=1, l=0; i < *n+1; i++){ for(j=i+1; j < *n+1; j++, l++){ // if you omit that part, some funny things // will happen with the memory out[l] = R_alloc(sizeOf8Chars, 1) ; sprintf(out[l], "i%03d.%03d", i, j) ; } } }
That code is stored in a file (say code.c) and compiled that way1):
R CMD SHLIB code.c
Under Linux/Unix, it creates a code.so file
Under Windows, it is named code.dll.
Now, we just have to write some R wrapper to call this code:
generateIndex6 <- function(n.item) { .C('generateIndex6', n = as.integer(n.item), out = character(n.item * (n.item - 1) / 2))$out }
That document describes the .C interface. Writing R extensions is useful too.
Then, from the R prompt :
> # First change your working directory to the one that contains your shared library > # setwd("[mypath]") # Adapt [mypath] of course and uncomment! > dyn.load("code.so") # Under Windows, use dyn.load("code") instead! > # Verification > generateIndex6(5) [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004" [7] "i002.005" "i003.004" "i003.005" "i004.005"
Is it really worth the challenge?
> # Timing > system.time(generateIndex5(1000)) [1] 2.24 0.00 2.23 NA NA > system.time(generateIndex6(1000)) [1] 1.13 0.00 1.13 NA NA > # Do not forget to unload C code when you don't need it any more! > dyn.unload("code")
Comparing generateIndex5() with generateIndex6(), one gets a rather obvious conclusion: the solution using C code is faster... but it is much an effort, and it is less easy to port that function (you must recompile the shared library on the other platforms).
Now, generateIndex6() is only twice faster than generateIndex5(), and with an example where the R code wastes time on useless calculations. What would happen if we needed the whole matrix instead? Well, change the C and R code as follows:
#include <R.h> void generateIndex6(int * n, char ** out){ int i,j,l; int sizeOf8Chars = sizeof("11111111") ; for(i=1, l=0; i < *n+1; i++){ for(j=1; j < *n+1; j++, l++){ // if you omit that part, some funny things // will happen with the memory out[l] = R_alloc(sizeOf8Chars, 1) ; sprintf(out[l], "i%03d.%03d", i, j) ; } } }
Recompile this code with:
R CMD SHLIB code.c
Then, source the new generateIndex5() and generateIndex6() functions that return the complete matrix:
generateIndex5 <- function(n.item) { idx <- function(x, y) paste("i", x, ".", y, sep = "") id <- formatC(1:n.item, digits = 2, flag = "0") t(outer(id, id, idx)) } generateIndex6 <- function(n.item) { res <- .C('generateIndex6', n = as.integer(n.item), out = character(n.item ^ 2))$out dim(res) <- c(n.item, n.item) # Make a matrix return(res) }
And then:
> # Verification > generateIndex5(5) [,1] [,2] [,3] [,4] [,5] [1,] "i001.001" "i002.001" "i003.001" "i004.001" "i005.001" [2,] "i001.002" "i002.002" "i003.002" "i004.002" "i005.002" [3,] "i001.003" "i002.003" "i003.003" "i004.003" "i005.003" [4,] "i001.004" "i002.004" "i003.004" "i004.004" "i005.004" [5,] "i001.005" "i002.005" "i003.005" "i004.005" "i005.005" > generateIndex6(5) [,1] [,2] [,3] [,4] [,5] [1,] "i001.001" "i002.001" "i003.001" "i004.001" "i005.001" [2,] "i001.002" "i002.002" "i003.002" "i004.002" "i005.002" [3,] "i001.003" "i002.003" "i003.003" "i004.003" "i005.003" [4,] "i001.004" "i002.004" "i003.004" "i004.004" "i005.004" [5,] "i001.005" "i002.005" "i003.005" "i004.005" "i005.005" > # Timing > system.time(generateIndex5(1000)) [1] 1.53 0.01 1.55 NA NA > system.time(generateIndex6(1000)) [1] 3.37 0.10 3.47 NA NA
Now, it is generateIndex5() (the pure R function) that is more than twice faster than generateIndex6() (the C code) function. So, well-optimized/vectorized R code can be as fast, or even faster than plain, unoptimized, C code!
more vectorization, proposed by Romain François.
Let’s try yet another approach, with plain R code (one simple problem, many different algorithms
):
generateIndex7 <- function(n) { mat <- matrix(rep(1:n, n), n) sprintf("i%03d.%03d", rep(1:(n-1), (n-1):1), mat[lower.tri(mat)]) }
and
> # Verification > generateIndex7(5) [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004" [7] "i002.005" "i003.004" "i003.005" "i004.005" > # Timing > system.time(generateIndex7(1000)) [1] 1.05 0.00 1.04 NA NA
This solution in pure R code is now competing on speed with the one using C code. Vectorization is the key. Everything here is vectorized, the rep(), the [, the sprintf().
Somehow, I am still frustrated that we need to make that matrix mat that have all the same columns. Maybe memory is so cheap that it is not a problem?
The C strikes back, by Romain François.
A problem with solution 6 is that you have to reallocate the memory, recall that line;
out[l] = R_alloc(sizeOf8Chars, 1) ;
So, let’s give the C side another chance, with further optimization.
void g8(int *n, int *m){ int i,j,l; int nplusun=*n+1; for(l=0, i=1; i<nplusun; i++){ for(j=i+1; j<nplusun; j++, l++){ m[l] = j; } } }
and write a short R wrapper:
generateIndex8 <- function(n) { sprintf("i%03d.%03d", rep(1:(n-1), (n-1):1), .C('g8', n = as.integer(n), m = integer(n * (n-1) / 2))$m) }
Sometimes 7 is faster, sometimes 8 is faster, I don’t know what to conclude. Actually I know, I should just act as a statistician and try the test:
> x <- numeric(50) > for (i in 1:50) + x[i] <- system.time(generateIndex8(1000))[3] > y <- numeric(50) > for(i in 1:50) + y[i] <- system.time(generateIndex7(1000))[3] > t.test(x, y) Welch Two Sample t-test data: x and y t = -15.1922, df = 66.067, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1500260 -0.1151740 sample estimates: mean of x mean of y 1.1020 1.2346
It seems that the eighth solution is significantly faster, using $$alpha = 5%$$, but there is about only 10% difference between the two solutions, no more! What’s next?
— Tony Plate 2006/02/07 08:01
It’s actually not too difficult to avoid allocating an entire matrix when only the upper triangle is needed:
generateIndex9 <- function(n) { sprintf("i%03d.%03d", rep(1:(n-1), (n-1):1), unlist(sapply(2:n, seq, n), use.names = FALSE)) }
On larger problems it seems to give a speedup compared to “solution 7” above:
> mean(sapply(1:20, function(i) system.time(generateIndex9(2000))[1])) [1] 5.5545 > mean(sapply(1:20, function(i) system.time(generateIndex7(2000))[1])) [1] 7.171
But on the size problem used above, it’s pretty much equivalent to “solution 7”:
> mean(sapply(1:20, function(i) system.time(generateIndex7(1000))[3])) [1] 1.164 > mean(sapply(1:20, function(i) system.time(generateIndex9(1000))[3])) [1] 1.11
So now, let’s compare the optimized pure R code, generateIndex9(), with optimized code mixing R and C, generateIndex8():
> R <- numeric(50) > for (i in 1:50) + R[i] <- system.time(generateIndex9(1000))[3] > C <- numeric(50) > for (i in 1:50) + C[i] <- system.time(generateIndex8(1000))[3] > t.test(R, C) Welch Two Sample t-test data: R and C t = 1.7425, df = 97.964, p-value = 0.08457 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.002722405 0.041922405 sample estimates: mean of x mean of y 1.1238 1.1042
... There is no significant difference between both codes at n = 1000, using $$alpha = 5%$$. A good pure R code comes side-by-side in term of speed with a solution involving C, in this particular case.
— Henrik Bengtsson 2007/07/09 14:53
In generateIndex9(n), sprintf() is called (n-1)*n/2 times but are only generating n unique strings. A better solution is to generate the n unique strings once as follows:
generateIndex10 <- function(n) { s <- sprintf("%03d", seq_len(n)); paste( "i", rep(s[1:(n-1)], (n-1):1), ".", unlist(lapply(2:n, function(k) s[k:n]), use.names=FALSE), sep="" ) }
This provides a significant speed up:
t9 <- mean(sapply(1:20, function(i) system.time(generateIndex9(1000))[1])) t10 <- mean(sapply(1:20, function(i) system.time(generateIndex10(1000))[1])) print(t9/t10) 4.03029
This was run on a different machine, but by extrapolation this means a speed up of more than 12000 times compared to the initial solution.
— Tony Breyal 2011-10-27 14:06 BST
NOTE: This is just another way of doing the tenth solution above, with me randomly trying a few things that accidently worked out well!
A speed up is further possible if the indices are calculated prior to inclusion in the string character vectors of the paste(...) function as follows.
generateIndex11 <- function(n) { # initialise vectors len <- (n-1)*n/2 s <- vector(mode = "character", length = n) ind.1 <- vector(mode = "integer", length = len) ind.2 <- vector(mode = "integer", length = len) # set up strings s <- sprintf("%03d", seq_len(n)) # calculate indicies ind.1 <- rep.int(1:(n-1), (n-1):1) ind.2 <- unlist(lapply(2:n, ":", n), use.names = FALSE) # paste strings together return(paste("i", s[ind.1], ".", s[ind.2], sep = "")) }
This gives the following timings (I have also included compiled versions of the functions using the new compiler package because it’s new and cool):
# load packages library(compiler) library(rbenchmark) # compile functions compiled_generateIndex10 <- cmpfun(generateIndex10) compiled_generateIndex11 <- cmpfun(generateIndex11) # set size n <- 5000 # run benchmark benchmark(generateIndex10(n), compiled_generateIndex10(n), generateIndex11(n), compiled_generateIndex11(n), columns = c("test", "replications", "elapsed", "relative"), order = "relative", replications = 10) ###--- TIMINGS ---### test replications elapsed relative 4 compiled_generateIndex11(n) 10 51.46 1.000000 3 generateIndex11(n) 10 51.67 1.004081 2 compiled_generateIndex10(n) 10 54.35 1.056160 1 generateIndex10(n) 10 61.20 1.189273
From the timings above we can see that the eleventh solution can shave about 10 seconds off the non-compiled tenth solution and about 3 seconds off the compiled tenth solution.
If further values of n are benchmarked, we end up with the following:
It seems that compiled_generateIndex11(n) is the overall winner. I ran these on an 8GB Windows 7 x64 PC. At extreme values of n it is interesting to observe the non-compiled version of generateIndex11(n).
— Tony Breyal 2011-10-31 14:20 BST
Thinking about this further, we can avoid calculating the indices vectors altogether from generateIndex11 by noticing the following pattern:
n=5; lapply(2:n, ":", n) [[1]] [1] 2 3 4 5 [[2]] [1] 3 4 5 [[3]] [1] 4 5 [[4]] [1] 5
The pattern we need is effectively “1 with 2,3,4,5 “, “2 with 3,4,5″, “3 with 4, 5″ and “4 with 5″. Therefore a twelfth solution is:
generateIndex12 <- function(n) { s <- sprintf("%03d", seq_len(n)) unlist(lapply(1:(n-1), function(i) paste("i", s[i], ".", s[seq.int(i+1, n)], sep = "")), use.names = FALSE) } # check identical(generateIndex11(500), generateIndex12(500) [1] TRUE
Which gives us the following timings for n = 2000
library(rbenchmark) library(compiler) generateIndex12.compiled <- cmpfun(generateIndex12) gc() n = 2000 benchmark(generateIndex10(n), generateIndex11(n), generateIndex12(n), generateIndex12.compiled(n), columns = c("test", "replications", "elapsed", "relative"), order = "relative", replications = 20) test replications elapsed relative 3 generateIndex12(n) 20 17.80 1.002817 4 generateIndex12.compiled(n) 20 18.16 1.023099 2 generateIndex11(n) 20 19.07 1.074366 1 generateIndex10(n) 20 19.97 1.125070
I think that a speed improvement could be achieved if this version of the algorithm were re-written in C++ using the Rcpp package (though this is beyond my current abilities).
— Tony Breyal 2011-11-01 03:06 BST
It seems that calculating the indices independently and initialising vectors is still slightly faster, hence a thirteenth solution is:
generateIndex13 <- function(n) { # initialise vectors s <- vector(mode = "character", length = n) ind <- vector(mode = "integer", length = (n-1)*n/2) # set up n unique strings s <- sprintf("%03d", seq_len(n)) # calculate sequence of indicies for second part of strings e.g. "002" "003" etc in "i001.002" "i001.003"... ind <- lapply(2:n, ":", n) # paste strings together unlist(lapply(1:(n-1), function(i) paste("i", s[i], ".", s[ind[[i]]], sep = "") ), use.names = FALSE) } # check identical(generateIndex10(500), generateIndex13(500) [1] TRUE
Which we can then benchmark as follows (compiled versions were not included because of constraints on time):
# load packages library(ggplot2) library(rbenchmark) get_bm <- function(n) { print(n) gc() df = benchmark(generateIndex10(n), generateIndex11(n), generateIndex12(n), generateIndex13(n), columns = c("test", "elapsed", "relative"), order = "elapsed", replications = 100) df$n = n return(df) } # get benchmarks for various values of n df.list <- lapply(seq(1000, 10000, by = 200), get_bm) df <- do.call("rbind", df.list) # plot results ggplot(df.compiled, aes(x=n, y=elapsed, colour=test)) + geom_point() + stat_smooth() + xlab("N") + ylab("Time (seconds)")
Bearing in mind that each algorithm is replicated 100 times at each value of n, the thirteenth solution produces a speed up at larger values of n.
Benchmarks performed on R version 2.13.0 on a Windows Server 2008 R2 x64 Workstation PC, 12GB RAM, Intel Xeon CPU X5650 @ 2.66GHz.
— Tony Breyal 2011-12-08
I noticed that the vector ind from the thirteenth solution can be eliminated altogether. Furthermore by replacing the function “paste” by “file.path” an even greater speed up over the previous solutions is possible. This use of “file.path” with parameter fsep=”” only works because there is never a character vector of length 0 for it to deal with.
generateIndex14 <- function(n) { # initialise vectors s <- vector(mode = "character", length = n) # set up n unique strings s <- sprintf("%03d", seq_len(n)) # paste strings together unlist(lapply(1:(n-1), function(i) file.path("i", s[i], ".", s[(i+1):n], fsep = "") ), use.names = FALSE) }
Timings:
test elapsed n replications generateIndex14(n) 27.27500 2000 50 generateIndex13(n) 33.09300 2000 50 generateIndex12(n) 35.31344 2000 50 generateIndex11(n) 36.32900 2000 50
Rcpp: A New Hope — Romain Francois 2011-12-08
(reproduced here with kind permission from Romain, see original: http://romainfrancois.blog.free.fr/index.php?post/2011/11/10/Code-optimization%2C-an-Rcpp-solution )
“...I figured it was time for an Rcpp based solution. This solutions moves down Henrik Bengtsson’s idea (which was at the basis of attempt 10) down to C++. The idea was to call sprintf less than the other solutions to generate the strings “001”, “002”, “003”, ...”
library(Rcpp) library(inline) generateIndex15 <- local( { fun <- cxxfunction( signature( n_ = "integer", width_ = "integer", format_ = "character" ), ' int n = as<int>(n_) ; int width = as<int>( width_ ) ; const char* format = as<const char*>( format_ ) ; std::string buffer( width, \'0\' ) ; std::vector< std::string > elements( n ) ; for( int i=0; i<n; i++){ sprintf( const_cast<char*>(buffer.data() ), format, i+1 ) ; elements[i] = buffer.c_str() ; } std::stringstream ss ; CharacterVector res( n*(n-1)/2) ; for( int i=0, k=0; i<n-1; i++){ for(int j=i+1; j<n; j++, k++){ ss << "i" << elements[i] << "." << elements[j]; res[k] = ss.str(); ss.str( "" ) ; } } return res ; ', plugin = "Rcpp" ) function( n ){ width <- ifelse( n<1000, 3, ceiling( log10(n+1) ) ) fun( n, width, sprintf("%%0%dd", width) ) } } )
Timings:
test elapsed n replications generateIndex15(n) 23.30100 2000 50 generateIndex14(n) 27.27500 2000 50 generateIndex13(n) 33.09300 2000 50 generateIndex12(n) 35.31344 2000 50 generateIndex11(n) 36.32900 2000 50
The R Strikes Back — Tony Breyal 2011-12-08
I really wanted to do this solution using Rcpp but my cpp-foo skills are still very grasshopper-y, so I gave up and did it in R instead :)
This solution works by generating the largest set of strings which start i001.xxx first and then replacing the “001” part with “002”, “003”, “004” for each increment up to n-1.
generateIndex16 <- function(n) { str <- vector("list", length = n-1) s <- vector(mode = "character", length = n) s <- sprintf("%03d", seq_len(n)) str[[1]] <- file.path("i", s[1], ".", s[-1], fsep = "") str[2:(n-1)] <- lapply(2:(n-1), function(i) sub("001", s[i], str[[1]][(i):(n-1)], fixed=TRUE)) unlist(str) }
The above requires matching the “001” first and then replacing it. However, we know that “001” will ALWAYS be in character positions 2, 3 and 4, and so there may be a way to avoid matching part altogether (i.e. replace a fixed position substring with another string of equal or larger length) but I could not work out how to do that outside of a regular expression.
Timings:
test elapsed n replications generateIndex16(n) 20.77200 2000 50 generateIndex15(n) 23.30100 2000 50 generateIndex14(n) 27.27500 2000 50 generateIndex13(n) 33.09300 2000 50 generateIndex12(n) 35.31344 2000 50 generateIndex11(n) 36.32900 2000 50
Here is a ggplot2 graph of the timings for various sizes of n, for the last few solutions, each performed with 50 replications through the rbenchmark package:
Code Optimisation: Revenge of the Rcpp — Romain Francois 2011-12-14
(reproduced here with very kind permission from Romain, see original blog post here: http://romainfrancois.blog.free.fr/index.php?post/2011/12/14/...-And-now-for-solution-17,-still-using-Rcpp )
“Here comes yet another sequel of the code optimization problem from the R wiki, still using Rcpp, but with a different strategy this time. Essentially, my previous version (15) was using stringstream although we don’t really need its functionality and it was slowing us down. Also, the characters “i” and “.” are always on the same position so we can assign them once and for all.
So without further ado, here is attempt 17:”
require(inline) require(Rcpp) generateIndex17 <- local( { fun <- cxxfunction( signature( n_ = "integer", width_ = "integer", format_ = "character" ), ' int n = as<int>(n_) ; int width = as<int>( width_ ) ; const char* format = as<const char*>( format_ ) ; std::string buffer( width, \'0\' ) ; std::vector< std::string > elements( n ) ; for( int i=0; i<n; i++){ sprintf( const_cast<char*>(buffer.data() ), format, i+1 ) ; elements[i] = buffer.c_str() ; } char buf[100] ; buf[0] = \'i\' ; buf[width+1] = \'.\' ; CharacterVector res( n*(n-1)/2) ; for( int i=0, k=0; i<n-1; i++){ strncpy( buf+1, elements[i].data(), width ) ; for(int j=i+1; j<n; j++, k++){ strncpy( buf + 2 + width, elements[j].c_str(), width ) ; res[k] = buf ; } } return res ; ', plugin = "Rcpp" ) function( n ){ width <- ifelse( n<1000, 3, ceiling( log10(n+1) ) ) fun( n, width, sprintf("%%0%dd", width) ) } } )
Here is a graph I came up with on my computer comparing the 16th and 17th solution using the rbenchmark package with 50 replications for various sizes of N (note that the 17th solution is a lot faster than the former 15th Rcpp solution):
The following table summarizes results:
| Solution | vectorized | uses C code | optimized | speed (sec) | comment |
|---|---|---|---|---|---|
| First | no | no | --- | 4.9 (n = 100) | very bad code... but readable |
| Second | no | no | -- | 4.7 (n = 100) | idem, but preallocates the vector |
| Third | no | no | -- | 4.7 (n = 100) | no improvement over solution #2 |
| Fourth | no | no | - | 0.33 (n = 100) | optimization inside the loops |
| Fifth | yes | no | + | 2.2 (n = 1000) | on the right way! (no loops) |
| Sixth | no | yes | ++ | 1.1 (n = 1000) | unoptimized C code |
| Seventh | yes | no | ++ | 1.1 (n = 1000) | optimized/vectorized R code |
| Eighth | yes | yes | +++ | 1.1 (n = 1000) | optimized C code |
| Nineth | yes | no | +++ | 1.1 (n = 1000) | fully optimized vect. R code |
| Tenth | yes | no | ++++ | 0.3 (n = 1000) | extra optimized vect. R code |
generateIndex8() and generateIndex9() are the fastest, but the last four solutions are really close in term of speed for n = 1000 (speed with varying n value and memory usage is not discussed here). What is important, is the huge difference between solution 1 and solution 9. I have (n = 1000):
> system.time(generateIndex1(1000)) [1] 3684.91 1.71 3706.53 NA NA > # more than ONE HOUR > system.time(generateIndex9(1000)) [1] 1.15 0.00 1.14 NA NA > # one second
That is, more than a 3000 times increase in speed2). Yet, both functions are pure R code! If you tend to consider R as slow, are you sure you really optimized your code? (same remark for memory usage).
generateIndex1-5() are better R code than the four previous versions (I am sure one can do even better3)!), but it is a little bit more intellectual work to create these codes (i.e., you have to rethink the problem using matrix calculation, and/or, rewrite part of it in C). However, the result is worth the effort!
generateIndex5() is suboptimal in that particular context, because it calculates the whole matrix first, and then, it extracts the lower triangle. However, it is an elegant code for a full matrix solution.
Depending on the problem, C code is not always a panacea. Sometimes, well-vectorized / well-optimized R code comes close in term of speed.
generateIndex1() did not used more than 70Mb of RAM memory on a 1Gb machine; no pagination occured during that computation.