This post shows the derivation of marginal distribution from a Poisson model with Gamma prior distribution. Specifically, the idea comes from Chapter 2 of Bayesian Data Analysis (BDA) 3rd Edition on page 49.
Figure 1 shows that most of the shaded counties are located in the middle of the country (Great Plains).
Both Figure 1 and Figure 2 show that the Great Plains has both the highest and lowest rates. Recall that the reason of this issue is sample size. Great Plains has many low-population counties; therefore rare cancer death rates, such as kidney cancer, are represented in both maps. There is no evidence from both maps that cancer rates are high (please read page 47 of the excellent book for more details).
This misleading patterns in the maps of raw death rates suggest that a Poisson model-based approach to estimating the true underlying rates might be helpful. Let’s construct a likelihood from a Poisson distribution.
yj∣θj∼Poisson(10njθj)with yj denotes the number of kidney cancer deaths in county j from 1980-1989, nj is the population of the county, and θj is the underlying rate in units of deaths per person per year.
The conjugate prior for Poisson model is Gamma distribution with parameters α and β:
θj∼Gamma(α,β).By multiplying Equation (1) and (2), we obtain the posterior
θj∣yj∼Gamma(α+yj,β+10nj).Recall that the Bayes Rule states that
Pr(θj∣yj)=Pr(yj∣θj)Pr(θj)Pr(yj)⟺Pr(yj)=Pr(yj∣θj)Pr(θj)Pr(θj∣yj).Specifically, we will derive the predictive distribution, the marginal distribution of yj, averaging over the prior distribution of θj or, in short, Pr(yj)Pr(yj). The objective of this post is showing this derivation.
How do we derive Pr(yj)?
Firstly, we have the likelihood, a Poisson distribution, as shown in Equation (1)
Pr(yj∣θj)=1yj!(10njθj)yje−10njθj.Secondly, we also have the prior, a Gamma distribution, as described in Equation (2)
Pr(θj)=βαΓ(α)θα−1je−βθj.Last but not least, we have our posterior distribution, a Gamma distribution, as shown in Equation (3)
Pr(θj∣yj)=(β+10nj)α+yjΓ(α+yj)θα+yj−1je−(β+10nj)θj.Let’s substitute Equation (5), (6), and (7) into Equation (4) as follows:
Pr(yj)=1yj!(10njθj)yje−10njθj×βαΓ(α)θα−1je−βθj(β+10nj)α+yjΓ(α+yj)θα+yj−1je−(β+10nj)θj=1yj!(10nj)yjθyjje−10njθjβαΓ(α)θα−1je−βθjΓ(α+yj)(β+10nj)α+yjθyjjθα−1je−βθje−10njθj=1yj!(10nj)yjθyjje−10njθjβαΓ(α)θα−1je−βθjΓ(α+yj)(β+10nj)α+yjθyjjθα−1je−βθje−10njθj=1yj!(10nj)yjβαΓ(α+yj)Γ(α)(β+10nj)α+yj=1yj!Γ(α+yj)Γ(α)(10nj)yj(β+10nj)α+yjβα=1yj!Γ(α+yj)Γ(α)(10nj)yj(β+10nj)yjβα(β+10nj)α=1yj!(α+yj−1)!(α−1)!(10nj)yj(β+10nj)yjβα(β+10nj)αbecause Γ(n)=(n−1)!=(α+yj−1α−1)(ββ+10nj)α(10njβ+10nj)yjbecause (nr)=n!r!(n−r)!=(yj+α−1α−1)(β10njβ10nj+1)α(1β10nj+1)yjAs we know that a negative binomial distribution, Neg-bin(α,β), is
θ∼(θ+α−1α−1)(ββ+1)α(1β+1)θ,θ=0,1,2,…Therefore, we conclude that Pr(yj) in Equation (16) is indeed a negative binomial distribution,
yj∼Neg-bin(α,β10nj)as explained on page 49 of the book.