Bootstrap and Bayesian Inference

You might have heard that the bootstrap procedure can be seen as an approximation of Bayesian inference. This post presents two examples to illustrate this relationship. The first example is based on continuous data with a normal distribution, and the second involves data with a discrete distribution on a finite sample space.

Example 1: Let 𝑥 = (𝑥₁, ..., 𝑥ₙ) be a random sample of size n obtained from a normal distribution 𝑁 (𝜃, 𝜎²), where 𝜃 is the unknow mean and 𝜎² is a known variance. To perform a Bayesian analysis for 𝜃, we assume a prior distribution 𝜃 ∼ 𝑁 (0, τ²), which leads to the posterior distribution

where x̅ = (𝑥₁ + ..., + 𝑥ₙ)/𝑛 denotes the sample mean. Note that the larger the τ², the more concentrated the posterior becomes around x̅, the maximum likelihood estimate (MLE) of 𝜃. As τ² → ∞ we obtain a noninformative prior and hence the posterior distribution 𝜃|𝑥 ∼ 𝑁 (x̅, 𝜎²/𝑛). This is identical to the sampling distribution of x̅* = (𝑥₁* + ..., + 𝑥ₙ*)/𝑛, where 𝑥* = (𝑥₁*, ..., 𝑥ₙ*) is a parametric bootstrap sample of size 𝑛 generated from the MLE of the sampling distribution 𝑁 (x̅, 𝜎²).

Example 2: Let 𝑑 = (𝑑₁, ..., 𝑑𝐾) be the vector of all possible distinct values of a random variable Ⅹ, and let 𝑝 = (𝑝₁, ..., 𝑝𝐾) be the associated vector of probabilities

𝑝ₖ = 𝑃(Ⅹ = 𝑑ₖ | 𝑝), Σ𝑝ₖ=1.

Let 𝑥 = (𝑥₁, ..., 𝑥ₙ) be a random sample of size 𝑛 from the distribution of Ⅹ and 𝑝̂ = (𝑝̂₁, ..., 𝑝̂𝐾) the vector of the observed proportions for 𝑝. To make Bayesian inference on 𝑝, we use a symmetric Dirichlet distribution as the prior: 𝑝 ∼ 𝐷𝑖𝑟(α1), where α > 0 is a constant (model parameter) and 1 denotes the vector of 1s. Then the posterior distribution is 𝑝|𝑥 ∼ 𝐷𝑖𝑟(α1+𝑛𝑝̂). Let α → 0 then we obtain an improper prior, which gives the posterior distribution

𝑝|𝑥 ∼ 𝐷𝑖𝑟(𝑛𝑝̂).

Now the bootstrap distribution obtained by sampling with replacement from the data 𝑥 = (𝑥₁, ..., 𝑥ₙ) can be expressed as sampling the category proportions from a multinomial distribution. Specifically, the observed proportions 𝑝̂* from a bootstrap sample 𝑥* = (𝑥₁*, ..., 𝑥ₙ*) has the following distribution

𝑝̂* ∼ [𝑀𝑢𝑙𝑡(𝑛, 𝑝̂)]/𝑛,

where 𝑀𝑢𝑙𝑡(𝑛, 𝑝̂) denotes the multinomial distribution with parameters 𝑛 and 𝑝̂.

The bootstrap distribution of 𝑝̂* is very similar to the posterior distribution of 𝑝|𝑥. In fact, the two distributions have the same mean and nearly the same variance-covariance matrix.

Since any parameter φ(𝑝, 𝑑) associated with the distribution of Ⅹ is completely determined by 𝑝, the bootstrap distribution of φ(𝑝̂*, 𝑑) will closely approximate the posterior distribution of φ(𝑝, 𝑑).

Note that the bootstrap distribution is obtained without the need to formally specify a prior or sample from a posterior distribution, which is why it is sometimes referred to as a "poor man's" Bayesian posterior. By perturbing the data, the bootstrap method approximates the Bayesian effect of perturbing the parameters, usually in a simpler and more accessible manner.

Bootstrap and Bayesian Inference

Recent Posts

Comments

Contact