2 Path-dependent loops

Here is second benchmark mainly dealing with array indexing and loops taken from here. The code is as follows:

library(inline)
library(rbenchmark)
library(compiler)

fun1 <- function(z) {
  for(i in 2:NROW(z)) {
    z[i] <- ifelse(z[i-1]==1, 1, 0)
  }
  z
}
fun1c <- cmpfun(fun1)


fun2 <- function(z) {
  for(i in 2:NROW(z)) {
    z[i] <- if(z[i-1]==1) 1 else 0
  }
  z
}
fun2c <- cmpfun(fun2)
z <- rep(c(1,1,0,0,0,0), 100)
identical(fun1(z),fun2(z),fun1c(z),fun2c(z))
 
res <- benchmark(fun1(z), fun2(z),
                  fun1c(z), fun2c(z),
                  columns=c("test", "replications", "elapsed", "relative", "user.self", "sys.self"),
                  order="relative",
                  replications=1000)
print(res)

I ran this code on my laptop (2.4Ghz, dual core) with the following result:

     test replications elapsed  relative user.self sys.self
4 fun2c(z)         1000   0.671  1.000000     0.661    0.008
2  fun2(z)         1000   2.720  4.053651     2.704    0.015
3 fun1c(z)         1000  15.330 22.846498    15.131    0.186
1  fun1(z)         1000  17.584 26.205663    17.241    0.263

Let us focus on fun2, the faster version. The equivalent P code is as follows:

source('stdlib.P')
fun2 <- function(z) {
  var i;
  for(i in 2:nrow(z)) {
    z[i] <- if (z[i-1]==1) 1 else 0
  }
  return(z)
}

tm = time
z = c()
i = 0
for ( i in (1:100) ) z = c( z, c(1,1,0,0,0,0) )
repetitions = 1000
i = 0
for ( i in (1:repetitions) ) call fun2( z )
print( time-tm )

(rep is implemented, but currently only to repeat scalars, sorry.)

P needs only 0.14 seconds to compute fun2 on z. This is a speedup factor of 2.7 / 0.138 = 19.28 compared to interpreted R and 0.671 / 0.14 = 4.79 compared to R's byte compiler.

When comparing with fun1, which is much slower in R and needs 17.58 seconds, the speedup factor is 17.58 / 0.14 = 125!