r - Optimizing alpha and beta in negative log likehood sum for beta binomial distribution -


i'm attempting create sigma/summation function variables in dataset looks this:

paste0("(choose(",zipdistrib$leads[1],",",zipdistrib$starts[1],")*beta(a+",zipdistrib$starts[1],",b+",zipdistrib$leads[1],"-",zipdistrib$starts[1],")/beta(a,b))") 

when enter code, get

[1] "(choose(9,6)*beta(a+6,b+9-6)/beta(a,b))" 

i want create sigma/summation function a , b unknown free-floating variables , values of leads[i] , starts[i] determined values leads , starts observation i in dataset. have tried using sum function in conjunction mapply , sapply no avail. currently, taking tack of creating function string using for loop in conjunction paste0 command things change values of variables leads , starts. then, try coercing result function. surprise, can enter code without creating syntax error, when try optimize function variables a , b, i'm not having success.

here's attempt create function out of string.

betafcn <- function (a,b) { abfcnstring <-    (i in 1:length(zipdistrib$zip5))     tostring(       paste0("      (choose(",zipdistrib$leads[i],",",zipdistrib$starts[i],")*beta(a+",zipdistrib$starts[i],",b+",zipdistrib$leads[i],"-",zipdistrib$starts[i],")/beta(a,b))+")    ) as.function(   as.list(     substr(abfcnstring, 1, nchar(abfcnstring)-1)    ) ) } 

then when try optimize function , b, following:

optim(c(a=.03, b=100), betafcn(a,b)) ## error in as.function.default(x, envir) :    argument must have length @ least 1 

is there better way me compile sigma i=1 length of dataset mapply or lapply or other *apply function? or stuck using dreaded for loop? , once create function, how make sure can optimize a , b?

update

this dataset like:

leads <-c(7,4,2) sales <-c(3,1,0) zipcodes <-factor(c("11111", "22222", "33333")) zipleads <-data.frame(zipcode=zipcodes, leads=leads, sales=sales) zipleads ##  zipcode leads sales # 1   11111     7     3 # 2   22222     4     1 # 3   33333     2     0 

my goal create function this:

betafcn <-function (a,b) {    (choose(7,3)*beta(a+3,b+7-3)/beta(a,b))+    (choose(4,1)*beta(a+4,b+4-1)/beta(a,b))+    (choose(2,0)*beta(a+0,b+2-0)/beta(a,b))   } 

the difference ideally replace dataset values other possible vectors leads , sales.

since r vectorizes of operations default, can write expression in terms of single values of a , b (which automatically recycled length of data) , vectors of x , y (i.e., leads , sales); if compute on log scale, can use sum() (rather prod()) combine results. think you're looking like:

betafcn <- function(a,b,x,y,log=false) {    r <- lchoose(x,y)+lbeta(a+x,b+x-y)-lbeta(a,b)    if (log) r else exp(r) } 

note (1) optim() minimizes default (2) if you're trying optimize likelihood you're better off optimizing log-likelihood instead ...

since of internal functions (+, lchoose, lbeta) vectorized, should able apply across whole data set via:

zipleads <- data.frame(leads=c(7,4,2),sales=c(3,1,0)) objfun <- function(p) {  ## negative log-likelihood     -sum(betafcn(p[1],p[2],zipleads$leads,zipleads$sales,          log=true)) } objfun(c(1,1)) optim(fn=objfun,par=c(1,1)) 

i got crazy answers example (extremely large values of both shape parameters), think that's because it's awfully hard fit two-parameter model 3 data points!

since shape parameters of beta-binomial (which appears be) have positive, might run trouble unconstrained optimization. can use method="l-bfgs-b", lower=c(0,0) or optimize parameters on log scale ...


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -