Code optimization - vectorization with character strings

Philippe Grosjean 2006/02/05 [Last tested: R 2.2.1 Win XP]

Timings are done on a Pentium IV 3Ghz machine under Win XP. If you have a slower machine, think to reduce the size of the calculation!

Question:

I want to generate some character strings like (here with n.item = 5, but I want to vary it):
  [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004"
  [7] "i002.005" "i003.004" "i003.005" "i004.005"

(question adapted from Taka Matzmoto on R-Help, with some code similar to generateIndex1()/generateIndex2() and solutions proposed by John Fox and Jim Holtman)

First solution

generateIndex1 <- function(n.item) {
    Res <- character(0)  # initialize vector
    for (i in 1:(n.item - 1)) {
        for (j in ((i+1):n.item)) {
            # concatenate the results
            Res <- c(Res, paste("i", formatC(i, digits = 2, flag = "0"),
                ".", formatC(j, digits = 2, flag = "0"), sep = ""))
        }
    }
    Res
}
 > # Verification
 > generateIndex1(5)
  [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004"
  [7] "i002.005" "i003.004" "i003.005" "i004.005"

Second solution

generateIndex2 <- function(n.item) {
    result <- rep("", n.item * (n.item - 1) / 2)
    index <- 0
    for (i in 1:(n.item - 1)) {
        for (j in ((i + 1):n.item)) {
            index <- index + 1                
            result[index] <- paste("i",
                formatC(i, digits = 2, flag = "0"), ".",
                formatC(j, digits = 2, flag = "0"), sep = "")
        }
    }
    result
}
 > # Verification
 > generateIndex2(5)
  [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004"
  [7] "i002.005" "i003.004" "i003.005" "i004.005"

The difference is: generateIndex1() creates an empty character vector and concatenate to it (simplest code), and generateIndex2() creates a vector of empty characters of the correct size [result <- rep(““, n.item * (n.item - 1) / 2)]. The second solution is supposed to be better, because result is supposed to be of the right size, limiting useless memory pagination inside each loop iteration.

However:

 > system.time(generateIndex1(100))
 [1] 4.86 0.00 4.86   NA   NA
 > system.time(generateIndex2(100))
 [1] 4.68 0.00 4.68   NA   NA

There is not much difference (well, indeed, the loops and what’s calculated repeatedly inside takes much more time in this case). I wonder what happens if I allocate a vector of the right size with strings having also the right number of characters:

Third solution

generateIndex3 <- function(n.item) {
    result <- rep("i000.000", n.item * (n.item - 1) / 2)
    index <- 0
    for (i in 1:(n.item - 1)) {
        for (j in ((i + 1):n.item)) {
            index <- index + 1                
            result[index] <- paste("i",
                formatC(i, digits = 2, flag = "0"), ".",
                formatC(j, digits = 2, flag = "0"), sep = "")
        }
    }
    result
}

and

 > # Verification
 > generateIndex3(5)
 [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004
 [7] "i002.005" "i003.004" "i003.005" "i004.005"
 > # Timing
 > system.time(generateIndex3(100))
 [1] 4.63 0.02 4.66   NA   NA

About the same.


In the case of string vectors, it does not help much to allocate the correct number of characters, only the length of the vector matters. This is not much obvious in our example, because most of the work is compiling the strings, but look at here:

createVector1 <- function(n.item) {
    res <- character(0)
    for (i in 1:n.item)
        res <- c(res, "bbbbb")
    res
}
 
createVector2 <- function(n.item) {
    res <- rep("", n.item)
    for (i in 1:n.item)
    	res[i] <- "bbbbb"
    res
}
 
createVector3 <- function(n.item) {
    res <- rep("aaaaa", n.item)
    for (i in 1:n.item)
    	res[i] <- "bbbbb"
    res
}
 > # Verification
 > createVector1(5)
 [1] "bbbbb" "bbbbb" "bbbbb" "bbbbb" "bbbbb"
 > createVector2(5)
 [1] "bbbbb" "bbbbb" "bbbbb" "bbbbb" "bbbbb"
 > createVector3(5)
 [1] "bbbbb" "bbbbb" "bbbbb" "bbbbb" "bbbbb"
 > 
 > # timing
 > system.time(createVector1(100000))
 [1] 68.33  8.51 77.46    NA    NA
 > system.time(createVector2(100000))
 [1] 0.28 0.00 0.29   NA   NA
 > system.time(createVector3(100000))
 [1] 0.28 0.00 0.28   NA   NA

Now, let’s come back to our problem. Where do we waste computing time?

 > Rprof()
 > res <- generateIndex3(100)
 > Rprof(NULL)
 > ?summaryRprof
 > summaryRprof()
 $by.self
                    self.time self.pct total.time total.pct
 formatC                 0.48     10.5       4.30      93.9
 paste                   0.46     10.0       4.54      99.1
 pmax                    0.44      9.6       0.66      14.4
 as.integer              0.30      6.6       0.34       7.4
 as.logical              0.24      5.2       0.34       7.4
 names                   0.20      4.4       0.24       5.2
 [...]

For sure! Why do I call formatC() twice in each loop? I can increase speed by formatting my character strings only once.

Fourth solution

generateIndex4 <- function(n.item) {
    result <- rep("i000.000", n.item * (n.item - 1) / 2)
    index <- 0
    id <- formatC(1:n.item, digits = 2, flag = "0")
    for (i in 1:(n.item - 1)) {
        for (j in ((i + 1):n.item)) {
            index <- index + 1                
            result[index] <- paste("i", id[i], ".", id[j], sep = "")
        }
    }
    result
}

and

 > # Verification
 > generateIndex4(5)
 [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004"
 [7] "i002.005" "i003.004" "i003.005" "i004.005"
 > 
 > # Timing
 > system.time(generateIndex4(100))
 [1] 0.33 0.00 0.33   NA   NA

Yes! That’s much better. Now, recall that it is better to use a vectorized algorithm than loops, could I get rid of these two ugly loops? Here is something using outer() and lower.tri():

Fifth solution

generateIndex5 <- function(n.item) {
    idx <- function(x, y) paste("i", x, ".", y, sep = "")
    id <- formatC(1:n.item, digits = 2, flag = "0")
    allidx <- t(outer(id, id, idx))
    allidx[lower.tri(allidx)]
}

and

 > # Verification
 > generateIndex5(5)
 [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004"
 [7] "i002.005" "i003.004" "i003.005" "i004.005"
 > 
 > # Timing
 > system.time(generateIndex5(100))
 [1] 0.02 0.00 0.02   NA   NA

Indeed! That code is much, much faster! Now, let’s compare generateIndex1() with generateIndex5().

  1. generateIndex5() is optimized for speed (4.86/0.02, about 250 times faster!)
  2. generateIndex5() is more concise code: 4 lines, no loops, against 8 lines with two loops.
  3. generateIndex1() is the code that comes to mind more easily (except, perhaps for some R experts (?) because thinking with vectors is their second nature).
  4. generateIndex1() is much easier to understand, when you read the code (for the same reason).

Could we do even better?

Sixth Solution

:A+: Solution proposed by Romain François.

Actually, reading the code of the fifth solution, it is quite obvious that it can be speeded up. The matrix allidx is entirely calculated when only the lower triangular part is wanted. One can write R code that calculates only the lower triangular part of the matrix. for instance:

generateIndex5b <- function(n.item) {
    idx <- function(x, y)
        ifelse(y > x, paste("i", x, ".", y, sep = ""), "")
    id <- formatC(1:n.item, digits = 2, flag = "0")
    allidx <- as.vector(t(outer(id, id, idx)))
    allidx[allidx != ""]
}

However, looking at ifelse(), it appears that the whole matrix is still calculated internally, and then, a selection is made according to the test! Calculation of the lower triangular part is only apparent in the idx() function and consequently, generateIndex5b() is not an improvement over generateIndex5() (it is actually much slower)!

One solution could be to get the hands dirty and write some (simple) C code and use the R calling mechanism to call it.

#include <R.h>
 
void generateIndex6(int * n, char ** out){
	int i,j,l;
	
	int sizeOf8Chars = sizeof("11111111") ;
	
	for(i=1, l=0; i < *n+1; i++){
		for(j=i+1; j < *n+1; j++, l++){
	   	
                // if you omit that part, some funny things
                // will happen with the memory
	   	out[l] = R_alloc(sizeOf8Chars, 1) ;
	   	
	   	sprintf(out[l], "i%03d.%03d", i, j) ;	
		}
	}
}

That code is stored in a file (say code.c) and compiled that way1):

R CMD SHLIB code.c

:LINUX: Under Linux/Unix, it creates a code.so file

:WINDOWS: Under Windows, it is named code.dll.

Now, we just have to write some R wrapper to call this code:

generateIndex6 <- function(n.item) {
    .C('generateIndex6', 
        n = as.integer(n.item),
        out = character(n.item * (n.item - 1) / 2))$out 
}

That document describes the .C interface. Writing R extensions is useful too.

Then, from the R prompt :

 > # First change your working directory to the one that contains your shared library
 > # setwd("[mypath]") # Adapt [mypath] of course and uncomment!
 > dyn.load("code.so")    # Under Windows, use dyn.load("code") instead!
 > # Verification
 > generateIndex6(5)
 [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004"
 [7] "i002.005" "i003.004" "i003.005" "i004.005"

Is it really worth the challenge?

 > # Timing
 > system.time(generateIndex5(1000))
 [1] 2.24 0.00 2.23   NA   NA
 > system.time(generateIndex6(1000))
 [1] 1.13 0.00 1.13   NA   NA
 > # Do not forget to unload C code when you don't need it any more!
 > dyn.unload("code")

Comparing generateIndex5() with generateIndex6(), one gets a rather obvious conclusion: the solution using C code is faster... but it is much an effort, and it is less easy to port that function (you must recompile the shared library on the other platforms).

Now, generateIndex6() is only twice faster than generateIndex5(), and with an example where the R code wastes time on useless calculations. What would happen if we needed the whole matrix instead? Well, change the C and R code as follows:

#include <R.h>
 
void generateIndex6(int * n, char ** out){
	int i,j,l;
	
	int sizeOf8Chars = sizeof("11111111") ;
	
	for(i=1, l=0; i < *n+1; i++){
		for(j=1; j < *n+1; j++, l++){
	   	
                // if you omit that part, some funny things
                // will happen with the memory
	   	out[l] = R_alloc(sizeOf8Chars, 1) ;
	   	
	   	sprintf(out[l], "i%03d.%03d", i, j) ;	
		}
	}
}

Recompile this code with:

R CMD SHLIB code.c

Then, source the new generateIndex5() and generateIndex6() functions that return the complete matrix:

generateIndex5 <- function(n.item) {
    idx <- function(x, y) paste("i", x, ".", y, sep = "")
    id <- formatC(1:n.item, digits = 2, flag = "0")
    t(outer(id, id, idx))
}
 
generateIndex6 <- function(n.item) {
    res <- .C('generateIndex6', 
        n = as.integer(n.item),
        out = character(n.item ^ 2))$out
    dim(res) <- c(n.item, n.item) # Make a matrix
    return(res) 
}

And then:

 > # Verification
 > generateIndex5(5)
      [,1]       [,2]       [,3]       [,4]       [,5]      
 [1,] "i001.001" "i002.001" "i003.001" "i004.001" "i005.001"
 [2,] "i001.002" "i002.002" "i003.002" "i004.002" "i005.002"
 [3,] "i001.003" "i002.003" "i003.003" "i004.003" "i005.003"
 [4,] "i001.004" "i002.004" "i003.004" "i004.004" "i005.004"
 [5,] "i001.005" "i002.005" "i003.005" "i004.005" "i005.005"
 > generateIndex6(5)
      [,1]       [,2]       [,3]       [,4]       [,5]      
 [1,] "i001.001" "i002.001" "i003.001" "i004.001" "i005.001"
 [2,] "i001.002" "i002.002" "i003.002" "i004.002" "i005.002"
 [3,] "i001.003" "i002.003" "i003.003" "i004.003" "i005.003"
 [4,] "i001.004" "i002.004" "i003.004" "i004.004" "i005.004"
 [5,] "i001.005" "i002.005" "i003.005" "i004.005" "i005.005"
 > # Timing
 > system.time(generateIndex5(1000))
 [1] 1.53 0.01 1.55   NA   NA
 > system.time(generateIndex6(1000))
 [1] 3.37 0.10 3.47   NA   NA

8-) Now, it is generateIndex5() (the pure R function) that is more than twice faster than generateIndex6() (the C code) function. So, well-optimized/vectorized R code can be as fast, or even faster than plain, unoptimized, C code!

Seventh Solution

:A+: more vectorization, proposed by Romain François.

Let’s try yet another approach, with plain R code (one simple problem, many different algorithms :!:):

generateIndex7 <- function(n) {
    mat <- matrix(rep(1:n, n), n)
    sprintf("i%03d.%03d", rep(1:(n-1), (n-1):1), mat[lower.tri(mat)])
}

and

 > # Verification
 > generateIndex7(5)
 [1] "i001.002" "i001.003" "i001.004" "i001.005" "i002.003" "i002.004"
 [7] "i002.005" "i003.004" "i003.005" "i004.005"
 > # Timing
 > system.time(generateIndex7(1000))
 [1] 1.05 0.00 1.04   NA   NA

This solution in pure R code is now competing on speed with the one using C code. Vectorization is the key. Everything here is vectorized, the rep(), the [, the sprintf().

Somehow, I am still frustrated that we need to make that matrix mat that have all the same columns. Maybe memory is so cheap that it is not a problem?

Eighth Solution

:A+: The C strikes back, by Romain François.

A problem with solution 6 is that you have to reallocate the memory, recall that line;

out[l] = R_alloc(sizeOf8Chars, 1) ;

So, let’s give the C side another chance, with further optimization.

void g8(int *n, int *m){
    int i,j,l;
    int nplusun=*n+1;
 
    for(l=0, i=1; i<nplusun; i++){
        for(j=i+1; j<nplusun; j++, l++){
            m[l] = j;
        }
    }
}

and write a short R wrapper:

generateIndex8 <- function(n) {
    sprintf("i%03d.%03d", 
        rep(1:(n-1), (n-1):1),
        .C('g8', n = as.integer(n), m = integer(n * (n-1) / 2))$m)
}

Sometimes 7 is faster, sometimes 8 is faster, I don’t know what to conclude. Actually I know, I should just act as a statistician and try the test:

 > x <- numeric(50)
 > for (i in 1:50)
 +    x[i] <-  system.time(generateIndex8(1000))[3]
 > y <- numeric(50)
 > for(i in 1:50)
 +    y[i] <-  system.time(generateIndex7(1000))[3]
 > t.test(x, y)
 
         Welch Two Sample t-test
 
 data:  x and y 
 t = -15.1922, df = 66.067, p-value < 2.2e-16
 alternative hypothesis: true difference in means is not equal to 0 
 95 percent confidence interval:
  -0.1500260 -0.1151740 
 sample estimates:
 mean of x mean of y 
    1.1020    1.2346 

It seems that the eighth solution is significantly faster, using $$alpha = 5%$$, but there is about only 10% difference between the two solutions, no more! What’s next?

Ninth solution (!)

:A+:Tony Plate 2006/02/07 08:01

It’s actually not too difficult to avoid allocating an entire matrix when only the upper triangle is needed:

generateIndex9 <- function(n) {
    sprintf("i%03d.%03d",
        rep(1:(n-1), (n-1):1),
        unlist(sapply(2:n, seq, n), use.names = FALSE))
}

On larger problems it seems to give a speedup compared to “solution 7” above:

 > mean(sapply(1:20, function(i) system.time(generateIndex9(2000))[1]))
 [1] 5.5545
 > mean(sapply(1:20, function(i) system.time(generateIndex7(2000))[1]))
 [1] 7.171

But on the size problem used above, it’s pretty much equivalent to “solution 7”:

 > mean(sapply(1:20, function(i) system.time(generateIndex7(1000))[3]))
 [1] 1.164
 > mean(sapply(1:20, function(i) system.time(generateIndex9(1000))[3]))
 [1] 1.11

So now, let’s compare the optimized pure R code, generateIndex9(), with optimized code mixing R and C, generateIndex8():

 > R <- numeric(50)
 > for (i in 1:50)
 +    R[i] <-  system.time(generateIndex9(1000))[3]
 > C <- numeric(50)
 > for (i in 1:50)
 +    C[i] <-  system.time(generateIndex8(1000))[3]
 > t.test(R, C)
 
        Welch Two Sample t-test
 
 data:  R and C 
 t = 1.7425, df = 97.964, p-value = 0.08457
 alternative hypothesis: true difference in means is not equal to 0 
 95 percent confidence interval:
  -0.002722405  0.041922405 
 sample estimates:
 mean of x mean of y 
    1.1238    1.1042 

... There is no significant difference between both codes at n = 1000, using $$alpha = 5%$$. A good pure R code comes side-by-side in term of speed with a solution involving C, in this particular case.

Tenth solution (!)

:A+:Henrik Bengtsson 2007/07/09 14:53

In generateIndex9(n), sprintf() is called (n-1)*n/2 times but are only generating n unique strings. A better solution is to generate the n unique strings once as follows:

generateIndex10 <- function(n) {
  s <- sprintf("%03d", seq_len(n));
  paste(
    "i",
    rep(s[1:(n-1)], (n-1):1),
    ".",
    unlist(lapply(2:n, function(k) s[k:n]), use.names=FALSE),
    sep=""
  )
}

This provides a significant speed up:

t9 <- mean(sapply(1:20, function(i) system.time(generateIndex9(1000))[1]))
t10 <- mean(sapply(1:20, function(i) system.time(generateIndex10(1000))[1]))
print(t9/t10)
 4.03029

This was run on a different machine, but by extrapolation this means a speed up of more than 12000 times compared to the initial solution.

Eleventh solution (!)

:A+:Tony Breyal 2011-10-27 14:06 BST

NOTE: This is just another way of doing the tenth solution above, with me randomly trying a few things that accidently worked out well!

A speed up is further possible if the indices are calculated prior to inclusion in the string character vectors of the paste(...) function as follows.

generateIndex11 <- function(n) { 
  # initialise vectors
  len <- (n-1)*n/2
  s <- vector(mode = "character", length = n)
  ind.1 <- vector(mode = "integer", length = len)
  ind.2 <- vector(mode = "integer", length = len)
  
  # set up strings     
  s <- sprintf("%03d", seq_len(n))
  
  # calculate indicies
  ind.1 <- rep.int(1:(n-1), (n-1):1)
  ind.2 <- unlist(lapply(2:n, ":", n), use.names = FALSE)
  
  # paste strings together
  return(paste("i", s[ind.1], ".", s[ind.2], sep = ""))
}

This gives the following timings (I have also included compiled versions of the functions using the new compiler package because it’s new and cool):

# load packages
library(compiler)
library(rbenchmark)
 
# compile functions
compiled_generateIndex10 <- cmpfun(generateIndex10)
compiled_generateIndex11 <- cmpfun(generateIndex11)
 
# set size
n <- 5000
 
# run benchmark
benchmark(generateIndex10(n), compiled_generateIndex10(n), generateIndex11(n), compiled_generateIndex11(n),
          columns = c("test", "replications", "elapsed", "relative"),
          order = "relative", 
          replications = 10)
 
 
###--- TIMINGS ---###
                          test replications elapsed relative
4 compiled_generateIndex11(n)            10   51.46 1.000000
3          generateIndex11(n)            10   51.67 1.004081
2  compiled_generateIndex10(n)           10   54.35 1.056160
1           generateIndex10(n)           10   61.20 1.189273

From the timings above we can see that the eleventh solution can shave about 10 seconds off the non-compiled tenth solution and about 3 seconds off the compiled tenth solution.

If further values of n are benchmarked, we end up with the following:

:tips:programming:algorithm_benchmarks_ggplot2_.jpeg

It seems that compiled_generateIndex11(n) is the overall winner. I ran these on an 8GB Windows 7 x64 PC. At extreme values of n it is interesting to observe the non-compiled version of generateIndex11(n).

Twelfth solution (!)

:A+:Tony Breyal 2011-10-31 14:20 BST

Thinking about this further, we can avoid calculating the indices vectors altogether from generateIndex11 by noticing the following pattern:

n=5; lapply(2:n, ":", n)
[[1]]
[1] 2 3 4 5
 
[[2]]
[1] 3 4 5
 
[[3]]
[1] 4 5
 
[[4]]
[1] 5

The pattern we need is effectively “1 with 2,3,4,5 “, “2 with 3,4,5″, “3 with 4, 5″ and “4 with 5″. Therefore a twelfth solution is:

generateIndex12 <- function(n) {
  s <- sprintf("%03d", seq_len(n))
  unlist(lapply(1:(n-1), function(i) 
                           paste("i", s[i], ".", s[seq.int(i+1, n)], sep = "")), use.names = FALSE)
}
 
# check
identical(generateIndex11(500), generateIndex12(500)
[1] TRUE

Which gives us the following timings for n = 2000

library(rbenchmark)
library(compiler)
 
generateIndex12.compiled <- cmpfun(generateIndex12)
gc()
n = 2000
benchmark(generateIndex10(n), generateIndex11(n),
          generateIndex12(n), generateIndex12.compiled(n),
          columns = c("test", "replications", "elapsed", "relative"),
          order = "relative",
          replications = 20)
 
                         test replications elapsed relative
3          generateIndex12(n)           20   17.80 1.002817
4 generateIndex12.compiled(n)           20   18.16 1.023099
2          generateIndex11(n)           20   19.07 1.074366
1          generateIndex10(n)           20   19.97 1.125070

I think that a speed improvement could be achieved if this version of the algorithm were re-written in C++ using the Rcpp package (though this is beyond my current abilities).

Thirteenth solution (!)

:A+:Tony Breyal 2011-11-01 03:06 BST

It seems that calculating the indices independently and initialising vectors is still slightly faster, hence a thirteenth solution is:

generateIndex13 <- function(n) {
  # initialise vectors
  s <- vector(mode = "character", length = n)
  ind <- vector(mode = "integer", length = (n-1)*n/2)
 
  # set up n unique strings
  s <- sprintf("%03d", seq_len(n))
 
  # calculate sequence of indicies for second part of strings e.g. "002" "003" etc in "i001.002" "i001.003"...
  ind <- lapply(2:n, ":", n)
 
  # paste strings together
  unlist(lapply(1:(n-1), function(i) paste("i", s[i], ".", s[ind[[i]]], sep = "") ), use.names = FALSE)
}
 
# check
identical(generateIndex10(500), generateIndex13(500)
[1] TRUE

Which we can then benchmark as follows (compiled versions were not included because of constraints on time):

# load packages
library(ggplot2)
library(rbenchmark)
 
get_bm <- function(n) {
  print(n)
  gc()
  df = benchmark(generateIndex10(n), generateIndex11(n), generateIndex12(n), generateIndex13(n), 
                 columns = c("test", "elapsed", "relative"),
                 order = "elapsed",
                 replications = 100)
  df$n = n
  return(df)
}
 
# get benchmarks for various values of n
df.list <- lapply(seq(1000, 10000, by = 200), get_bm)
df <- do.call("rbind", df.list)
 
# plot results
ggplot(df.compiled, aes(x=n, y=elapsed, colour=test)) + geom_point() + stat_smooth() + xlab("N") + ylab("Time (seconds)")

Bearing in mind that each algorithm is replicated 100 times at each value of n, the thirteenth solution produces a speed up at larger values of n.

:tips:programming:benchmark_10_vs_11_vs_12_vs_13.jpeg

Benchmarks performed on R version 2.13.0 on a Windows Server 2008 R2 x64 Workstation PC, 12GB RAM, Intel Xeon CPU X5650 @ 2.66GHz.

Fouteenth solution (!)

:A+:Tony Breyal 2011-12-08

I noticed that the vector ind from the thirteenth solution can be eliminated altogether. Furthermore by replacing the function “paste” by “file.path” an even greater speed up over the previous solutions is possible. This use of “file.path” with parameter fsep=”” only works because there is never a character vector of length 0 for it to deal with.

generateIndex14 <- function(n) {
  # initialise vectors
  s <- vector(mode = "character", length = n)
 
  # set up n unique strings
  s <- sprintf("%03d", seq_len(n))
 
  # paste strings together
  unlist(lapply(1:(n-1), function(i) file.path("i", s[i], ".", s[(i+1):n], fsep = "") ), use.names = FALSE)
}

Timings:

               test  elapsed    n replications
 generateIndex14(n) 27.27500 2000           50
 generateIndex13(n) 33.09300 2000           50
 generateIndex12(n) 35.31344 2000           50
 generateIndex11(n) 36.32900 2000           50

Fifteenth solution (!!)

:A+: Rcpp: A New Hope — Romain Francois 2011-12-08

(reproduced here with kind permission from Romain, see original: http://romainfrancois.blog.free.fr/index.php?post/2011/11/10/Code-optimization%2C-an-Rcpp-solution )

“...I figured it was time for an Rcpp based solution. This solutions moves down Henrik Bengtsson’s idea (which was at the basis of attempt 10) down to C++. The idea was to call sprintf less than the other solutions to generate the strings “001”, “002”, “003”, ...”

library(Rcpp)
library(inline)
 
generateIndex15 <- local( {
 
    fun <- cxxfunction(
        signature( n_ = "integer", width_ = "integer", format_ = "character"  ), '
 
        int n = as<int>(n_) ;
        int width = as<int>( width_ ) ;
        const char* format = as<const char*>( format_ ) ;
        
        std::string buffer( width, \'0\' ) ;
        std::vector< std::string > elements( n ) ;
        for( int i=0; i<n; i++){
                sprintf( const_cast<char*>(buffer.data() ), 
                    format, 
                    i+1
                    ) ;
            elements[i] = buffer.c_str() ;
        }
        
        std::stringstream ss ;
        
        CharacterVector res( n*(n-1)/2) ;
        for( int i=0, k=0; i<n-1; i++){
            for(int j=i+1; j<n; j++, k++){
                ss << "i" << elements[i] << "." << elements[j];
                res[k] = ss.str(); 
                ss.str( "" ) ;
            }
        }
        return res ;
        ', plugin = "Rcpp" )
 
    function( n ){
        width <- ifelse( n<1000, 3, ceiling( log10(n+1) ) )
        fun( n, width, sprintf("%%0%dd", width)  )
    }
} )

Timings:

               test  elapsed    n replications
 generateIndex15(n) 23.30100 2000           50
 generateIndex14(n) 27.27500 2000           50
 generateIndex13(n) 33.09300 2000           50
 generateIndex12(n) 35.31344 2000           50
 generateIndex11(n) 36.32900 2000           50

Sixteenth solution (!!)

:A+: The R Strikes Back — Tony Breyal 2011-12-08

I really wanted to do this solution using Rcpp but my cpp-foo skills are still very grasshopper-y, so I gave up and did it in R instead :)

This solution works by generating the largest set of strings which start i001.xxx first and then replacing the “001” part with “002”, “003”, “004” for each increment up to n-1.

generateIndex16 <- function(n) {
  str <- vector("list", length = n-1)
  s <- vector(mode = "character", length = n)
  
  s <- sprintf("%03d", seq_len(n))
  str[[1]] <- file.path("i", s[1], ".", s[-1], fsep = "")
  
  str[2:(n-1)] <- lapply(2:(n-1), function(i) sub("001", s[i], str[[1]][(i):(n-1)], fixed=TRUE))
  unlist(str)
}

The above requires matching the “001” first and then replacing it. However, we know that “001” will ALWAYS be in character positions 2, 3 and 4, and so there may be a way to avoid matching part altogether (i.e. replace a fixed position substring with another string of equal or larger length) but I could not work out how to do that outside of a regular expression.

Timings:

               test  elapsed    n replications
 generateIndex16(n) 20.77200 2000           50
 generateIndex15(n) 23.30100 2000           50
 generateIndex14(n) 27.27500 2000           50
 generateIndex13(n) 33.09300 2000           50
 generateIndex12(n) 35.31344 2000           50
 generateIndex11(n) 36.32900 2000           50

Here is a ggplot2 graph of the timings for various sizes of n, for the last few solutions, each performed with 50 replications through the rbenchmark package:

:tips:programming:optim_sols_11-16.jpeg

Seventeenth solution (!!)

:A+: Code Optimisation: Revenge of the Rcpp — Romain Francois 2011-12-14

(reproduced here with very kind permission from Romain, see original blog post here: http://romainfrancois.blog.free.fr/index.php?post/2011/12/14/...-And-now-for-solution-17,-still-using-Rcpp )

“Here comes yet another sequel of the code optimization problem from the R wiki, still using Rcpp, but with a different strategy this time. Essentially, my previous version (15) was using stringstream although we don’t really need its functionality and it was slowing us down. Also, the characters “i” and “.” are always on the same position so we can assign them once and for all.

So without further ado, here is attempt 17:”

require(inline)
require(Rcpp)
generateIndex17 <- local( {
 
    fun <- cxxfunction(
        signature( n_ = "integer", width_ = "integer", format_ = "character"  ), '
 
        int n = as<int>(n_) ;
        int width = as<int>( width_ ) ;
        const char* format = as<const char*>( format_ ) ;
        
        std::string buffer( width, \'0\' ) ;
        std::vector< std::string > elements( n ) ;
        for( int i=0; i<n; i++){
                sprintf( const_cast<char*>(buffer.data() ), 
                    format, 
                    i+1
                    ) ;
            elements[i] = buffer.c_str() ;
        }
        
        char buf[100] ;
        buf[0] = \'i\' ;
        buf[width+1] = \'.\' ;
        
        CharacterVector res( n*(n-1)/2) ;
        for( int i=0, k=0; i<n-1; i++){
            strncpy( buf+1, elements[i].data(), width ) ;
            for(int j=i+1; j<n; j++, k++){
                strncpy( buf + 2 + width, elements[j].c_str(), width ) ;
                res[k] = buf ;
            }
        }
        return res ;
        ', plugin = "Rcpp" )
 
    function( n ){
        width <- ifelse( n<1000, 3, ceiling( log10(n+1) ) )
        fun( n, width, sprintf("%%0%dd", width)  )
    }
} )

Here is a graph I came up with on my computer comparing the 16th and 17th solution using the rbenchmark package with 50 replications for various sizes of N (note that the 17th solution is a lot faster than the former 15th Rcpp solution):

:tips:programming:rwiki_16_vs_17.jpeg

Final conclusions

The following table summarizes results:

Solution vectorized uses C code optimized speed (sec) comment
First no no --- 4.9 (n = 100) very bad code... but readable
Second no no -- 4.7 (n = 100) idem, but preallocates the vector
Third no no -- 4.7 (n = 100) no improvement over solution #2
Fourth no no - 0.33 (n = 100) optimization inside the loops
Fifth yes no + 2.2 (n = 1000) on the right way! (no loops)
Sixth no yes ++ 1.1 (n = 1000) unoptimized C code
Seventh yes no ++ 1.1 (n = 1000) optimized/vectorized R code
Eighth yes yes +++ 1.1 (n = 1000) optimized C code
Nineth yes no +++ 1.1 (n = 1000) fully optimized vect. R code
Tenth yes no ++++ 0.3 (n = 1000) extra optimized vect. R code

generateIndex8() and generateIndex9() are the fastest, but the last four solutions are really close in term of speed for n = 1000 (speed with varying n value and memory usage is not discussed here). What is important, is the huge difference between solution 1 and solution 9. I have (n = 1000):

 > system.time(generateIndex1(1000))
 [1] 3684.91    1.71 3706.53      NA      NA
 > # more than ONE HOUR
 
 > system.time(generateIndex9(1000))
 [1] 1.15 0.00 1.14   NA   NA
 > # one second

That is, more than a 3000 times increase in speed2). Yet, both functions are pure R code! If you tend to consider R as slow, are you sure you really optimized your code? (same remark for memory usage).

generateIndex1-5() are better R code than the four previous versions (I am sure one can do even better3)!), but it is a little bit more intellectual work to create these codes (i.e., you have to rethink the problem using matrix calculation, and/or, rewrite part of it in C). However, the result is worth the effort!

generateIndex5() is suboptimal in that particular context, because it calculates the whole matrix first, and then, it extracts the lower triangle. However, it is an elegant code for a full matrix solution.

Depending on the problem, C code is not always a panacea. Sometimes, well-vectorized / well-optimized R code comes close in term of speed.

1) Under Windows, you must first install ‘tools’, ‘MingW’ and ‘Perl’ and set up the ‘path’ variable adequately, according to the instruction in http://www.murdoch-sutherland.com/Rtools/.
2) generateIndex1() did not used more than 70Mb of RAM memory on a 1Gb machine; no pagination occured during that computation.
3) Please, feel free to add a tenth solution if you got something better!
 
tips/programming/code_optim2.txt · Last modified: 2011/12/19 by tony_breyal
 
Recent changes RSS feed R Wiki powered by Driven by DokuWiki and optimized for Firefox Creative Commons License