r - Speed up tapply with changing groups -


i writing function calculate difference in mean of 2 groups, groups changes each time, simple results, problem have rather large data set, speed key. "readable" version, using iris data example.

loopdif = function(nsim) {   change = numeric(nsim)   var = iris$sepal.length   (i in 1:nsim){     randomspecies = sample(c("a","b"), length(var), replace=true)     change[i]  =  diff(tapply(var,  randomspecies,  mean))   }   return(change) }  > system.time(loopdif(10000))    user  system elapsed     2.06    0.00    2.06  

the tried vectorise code:

slowdif <- function(nsim) {   change = numeric(nsim)   randomspecies = replicate(nsim,sample(c("a","b"), length(var), replace=true))   var = iris$sepal.length   change = diff(unlist(lapply(split(randomspecies, col(randomspecies)),                               function(x) unlist(lapply(split(var, x), mean)))))   return(change) }  > system.time(slowdif(10000))    user  system elapsed     1.42    0.00    1.42 

it faster now, still not faster enough, hope make under 1 second, or 0.75 seconds. reason obsessed time because have deadline meet, current code isn't fast enough.

i tried profiling tells me unlist(lapply()) part bottleneck, have no idea how rewrite it.

i appreciate if provide me alternative, suggestions. thanks.

try this:

loopdif2 <- function(nsim) {     change <- numeric(nsim)     var <- iris$sepal.length     nagroup<-rbinom(nsim,length(var),0.5)     tot<-sum(var)     (i in 1:nsim){       change[i]<-sum(var[sample(length(var),nagroup[i])])     }     change/nagroup-(tot-change)/(length(var)-nagroup) } 

in words: first extract number of elements of a group, keeping b group implicit. extract indices of a group in each iteration. evaluate sum , divide number of elements mean. other sum total sum of variable less sum of a group. mean of b group evaluated.

performance on pc:

system.time(loopdif(10000)) # user  system elapsed  #3.855   0.004   3.867  system.time(loopdif2(10000)) # user  system elapsed  #0.139   0.000   0.139  

Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -