Processing math: 100%

Deriving Marginal Distribution from Poisson & Gamma Conjugate Pair

This post shows the derivation of marginal distribution from a Poisson model with Gamma prior distribution. Specifically, the idea comes from Chapter 2 of Bayesian Data Analysis (BDA) 3rd Edition on page 49.

img1

Figure 1Figure 1: The counties of the United States with the highest 10% age-standardized death rates for cancer of kidney/ureter for U.S. white males, 1980-1989. Image taken from BDA 3rd Edition, some rights reserved.

Figure 1 shows that most of the shaded counties are located in the middle of the country (Great Plains).

img1

Figure 2Figure 2: The counties of the United States with the lowest 10% age-standardized death rates for cancer of kidney/ureter for U.S. white males, 1980-1989. Interestingly, the pattern is somewhat similar to the map of the highest rates in Figure 1. Image taken from BDA 3rd Edition, some rights reserved.

Both Figure 1 and Figure 2 show that the Great Plains has both the highest and lowest rates. Recall that the reason of this issue is sample size. Great Plains has many low-population counties; therefore rare cancer death rates, such as kidney cancer, are represented in both maps. There is no evidence from both maps that cancer rates are high (please read page 47 of the excellent book for more details).

This misleading patterns in the maps of raw death rates suggest that a Poisson model-based approach to estimating the true underlying rates might be helpful. Let’s construct a likelihood from a Poisson distribution.

yjθjPoisson(10njθj)

with yj denotes the number of kidney cancer deaths in county j from 1980-1989, nj is the population of the county, and θj is the underlying rate in units of deaths per person per year.

The conjugate prior for Poisson model is Gamma distribution with parameters α and β:

θjGamma(α,β).

By multiplying Equation (1) and (2), we obtain the posterior

θjyjGamma(α+yj,β+10nj).

Recall that the Bayes Rule states that

Pr(θjyj)=Pr(yjθj)Pr(θj)Pr(yj)Pr(yj)=Pr(yjθj)Pr(θj)Pr(θjyj).

Specifically, we will derive the predictive distribution, the marginal distribution of yj, averaging over the prior distribution of θj or, in short, Pr(yj)Pr(yj). The objective of this post is showing this derivation.

How do we derive Pr(yj)?

Firstly, we have the likelihood, a Poisson distribution, as shown in Equation (1)

Pr(yjθj)=1yj!(10njθj)yje10njθj.

Secondly, we also have the prior, a Gamma distribution, as described in Equation (2)

Pr(θj)=βαΓ(α)θα1jeβθj.

Last but not least, we have our posterior distribution, a Gamma distribution, as shown in Equation (3)

Pr(θjyj)=(β+10nj)α+yjΓ(α+yj)θα+yj1je(β+10nj)θj.

Let’s substitute Equation (5), (6), and (7) into Equation (4) as follows:

Pr(yj)=1yj!(10njθj)yje10njθj×βαΓ(α)θα1jeβθj(β+10nj)α+yjΓ(α+yj)θα+yj1je(β+10nj)θj=1yj!(10nj)yjθyjje10njθjβαΓ(α)θα1jeβθjΓ(α+yj)(β+10nj)α+yjθyjjθα1jeβθje10njθj=1yj!(10nj)yjθyjje10njθjβαΓ(α)θα1jeβθjΓ(α+yj)(β+10nj)α+yjθyjjθα1jeβθje10njθj=1yj!(10nj)yjβαΓ(α+yj)Γ(α)(β+10nj)α+yj=1yj!Γ(α+yj)Γ(α)(10nj)yj(β+10nj)α+yjβα=1yj!Γ(α+yj)Γ(α)(10nj)yj(β+10nj)yjβα(β+10nj)α=1yj!(α+yj1)!(α1)!(10nj)yj(β+10nj)yjβα(β+10nj)αbecause Γ(n)=(n1)!=(α+yj1α1)(ββ+10nj)α(10njβ+10nj)yjbecause (nr)=n!r!(nr)!=(yj+α1α1)(β10njβ10nj+1)α(1β10nj+1)yj

As we know that a negative binomial distribution, Neg-bin(α,β), is

θ(θ+α1α1)(ββ+1)α(1β+1)θ,θ=0,1,2,

Therefore, we conclude that Pr(yj) in Equation (16) is indeed a negative binomial distribution,

yjNeg-bin(α,β10nj)

as explained on page 49 of the book.


Written on December 23, 2020