r - Optimizing alpha and beta in negative log likehood sum for beta binomial distribution -
i'm attempting create sigma/summation function variables in dataset looks this:
paste0("(choose(",zipdistrib$leads[1],",",zipdistrib$starts[1],")*beta(a+",zipdistrib$starts[1],",b+",zipdistrib$leads[1],"-",zipdistrib$starts[1],")/beta(a,b))")
when enter code, get
[1] "(choose(9,6)*beta(a+6,b+9-6)/beta(a,b))"
i want create sigma/summation function a
, b
unknown free-floating variables , values of leads[i]
, starts[i]
determined values leads
, starts
observation i
in dataset. have tried using sum
function in conjunction mapply
, sapply
no avail. currently, taking tack of creating function string using for
loop in conjunction paste0
command things change values of variables leads
, starts
. then, try coercing result function. surprise, can enter code without creating syntax error, when try optimize function variables a
, b
, i'm not having success.
here's attempt create function out of string.
betafcn <- function (a,b) { abfcnstring <- (i in 1:length(zipdistrib$zip5)) tostring( paste0(" (choose(",zipdistrib$leads[i],",",zipdistrib$starts[i],")*beta(a+",zipdistrib$starts[i],",b+",zipdistrib$leads[i],"-",zipdistrib$starts[i],")/beta(a,b))+") ) as.function( as.list( substr(abfcnstring, 1, nchar(abfcnstring)-1) ) ) }
then when try optimize function , b, following:
optim(c(a=.03, b=100), betafcn(a,b)) ## error in as.function.default(x, envir) : argument must have length @ least 1
is there better way me compile sigma
i=1
length of dataset mapply
or lapply
or other *apply
function? or stuck using dreaded for
loop? , once create function, how make sure can optimize a
, b
?
update
this dataset like:
leads <-c(7,4,2) sales <-c(3,1,0) zipcodes <-factor(c("11111", "22222", "33333")) zipleads <-data.frame(zipcode=zipcodes, leads=leads, sales=sales) zipleads ## zipcode leads sales # 1 11111 7 3 # 2 22222 4 1 # 3 33333 2 0
my goal create function this:
betafcn <-function (a,b) { (choose(7,3)*beta(a+3,b+7-3)/beta(a,b))+ (choose(4,1)*beta(a+4,b+4-1)/beta(a,b))+ (choose(2,0)*beta(a+0,b+2-0)/beta(a,b)) }
the difference ideally replace dataset values other possible vectors leads , sales.
since r vectorizes of operations default, can write expression in terms of single values of a
, b
(which automatically recycled length of data) , vectors of x
, y
(i.e., leads
, sales
); if compute on log scale, can use sum()
(rather prod()
) combine results. think you're looking like:
betafcn <- function(a,b,x,y,log=false) { r <- lchoose(x,y)+lbeta(a+x,b+x-y)-lbeta(a,b) if (log) r else exp(r) }
note (1) optim()
minimizes default (2) if you're trying optimize likelihood you're better off optimizing log-likelihood instead ...
since of internal functions (+
, lchoose
, lbeta
) vectorized, should able apply across whole data set via:
zipleads <- data.frame(leads=c(7,4,2),sales=c(3,1,0)) objfun <- function(p) { ## negative log-likelihood -sum(betafcn(p[1],p[2],zipleads$leads,zipleads$sales, log=true)) } objfun(c(1,1)) optim(fn=objfun,par=c(1,1))
i got crazy answers example (extremely large values of both shape parameters), think that's because it's awfully hard fit two-parameter model 3 data points!
since shape parameters of beta-binomial (which appears be) have positive, might run trouble unconstrained optimization. can use method="l-bfgs-b", lower=c(0,0)
or optimize parameters on log scale ...
Comments
Post a Comment