top of page
Writer's pictureAndrew Yan

Bootstrap and Bayesian Inference

You might have heard that the bootstrap procedure can be seen as an approximation of Bayesian inference. This post presents two examples to illustrate this relationship. The first example is based on continuous data with a normal distribution, and the second involves data with a discrete distribution on a finite sample space.

Example 1: Let š‘„ = (š‘„ā‚, ..., š‘„ā‚™) be a random sample of size n obtained from a normal distribution š‘ (šœƒ, šœŽĀ²), where šœƒ is the unknow mean and šœŽĀ² is a known variance. To perform a Bayesian analysis for šœƒ, we assume a prior distribution šœƒ āˆ¼ š‘ (0, Ļ„Ā²), which leads to the posterior distribution

where xĢ…Ā = (š‘„ā‚ + ..., + š‘„ā‚™)/š‘› denotes the sample mean. Note that the larger the Ļ„Ā², the more concentrated the posterior becomes around xĢ…, the maximum likelihood estimate (MLE) of šœƒ. As Ļ„Ā² ā†’ āˆž we obtain a noninformative prior and hence the posterior distribution šœƒ|š‘„ āˆ¼ š‘ (xĢ…, šœŽĀ²/š‘›). This is identical to the sampling distribution of xĢ…* = (š‘„ā‚* + ..., + š‘„ā‚™*)/š‘›, where š‘„*Ā = (š‘„ā‚*, ..., š‘„ā‚™*) is a parametric bootstrap sample of size š‘› generated from the MLE of the sampling distribution š‘ (xĢ…, šœŽĀ²).

Example 2: Let š‘‘ = (š‘‘ā‚, ..., š‘‘š¾) be the vector of all possible distinct values of a random variable ā…©, and let š‘ = (š‘ā‚, ..., š‘š¾) be the associated vector of probabilities

š‘ā‚– = š‘ƒ(ā…© = š‘‘ā‚– | š‘), Ī£š‘ā‚–=1.

Let š‘„Ā = (š‘„ā‚, ..., š‘„ā‚™) be a random sample of size š‘› from the distribution of ā…© and š‘Ģ‚ = (š‘Ģ‚ā‚, ..., š‘Ģ‚š¾) the vector of the observed proportions for š‘. To make Bayesian inference on š‘, we use a symmetric Dirichlet distribution as the prior: š‘ āˆ¼ š·š‘–š‘Ÿ(Ī±1), where Ī± > 0 is a constant (model parameter) and 1 denotes the vector of 1s. Then the posterior distribution is š‘|š‘„ āˆ¼Ā š·š‘–š‘Ÿ(Ī±1+š‘›š‘Ģ‚). Let Ī± ā†’ 0 then we obtain an improper prior, which gives the posterior distribution

š‘|š‘„ āˆ¼Ā š·š‘–š‘Ÿ(š‘›š‘Ģ‚).

Now the bootstrap distribution obtained by sampling with replacement from the data š‘„Ā = (š‘„ā‚, ..., š‘„ā‚™) can be expressed as sampling the category proportions from a multinomial distribution. Specifically, the observed proportions š‘Ģ‚* from a bootstrap sample š‘„*Ā = (š‘„ā‚*, ..., š‘„ā‚™*) has the following distribution

š‘Ģ‚* āˆ¼Ā [š‘€š‘¢š‘™š‘”(š‘›, š‘Ģ‚)]/š‘›,

where š‘€š‘¢š‘™š‘”(š‘›, š‘Ģ‚) denotes the multinomial distribution with parameters š‘› and š‘Ģ‚.

The bootstrap distribution of š‘Ģ‚* is very similar to the posterior distribution of š‘|š‘„. In fact, the two distributions have the same mean and nearly the same variance-covariance matrix.

Since any parameter Ļ†(š‘, š‘‘) associated with the distribution of ā…© is completely determined by š‘, the bootstrap distribution of Ļ†(š‘Ģ‚*, š‘‘) will closely approximate the posterior distribution of Ļ†(š‘, š‘‘).

Note that the bootstrap distribution is obtained without the need to formally specify a prior or sample from a posterior distribution, which is why it is sometimes referred to as a "poor man's" Bayesian posterior. By perturbing the data, the bootstrap method approximates the Bayesian effect of perturbing the parameters, usually in a simpler and more accessible manner.


0 comments

Recent Posts

See All

Comments


bottom of page