You might have heard that the bootstrap procedure can be seen as an approximation of Bayesian inference. This post presents two examples to illustrate this relationship. The first example is based on continuous data with a normal distribution, and the second involves data with a discrete distribution on a finite sample space.
Example 1: Let š„ = (š„ā, ..., š„ā) be a random sample of size n obtained from a normal distribution š (š, šĀ²), where š is the unknow mean and šĀ² is a known variance. To perform a Bayesian analysis for š, we assume a prior distribution š ā¼ š (0, ĻĀ²), which leads to the posterior distribution
where xĢ Ā = (š„ā + ..., + š„ā)/š denotes the sample mean. Note that the larger the ĻĀ², the more concentrated the posterior becomes around xĢ , the maximum likelihood estimate (MLE) of š. As ĻĀ² ā ā we obtain a noninformative prior and hence the posterior distribution š|š„ ā¼ š (xĢ , šĀ²/š). This is identical to the sampling distribution of xĢ * = (š„ā* + ..., + š„ā*)/š, where š„*Ā = (š„ā*, ..., š„ā*) is a parametric bootstrap sample of size š generated from the MLE of the sampling distribution š (xĢ , šĀ²).
Example 2: Let š = (šā, ..., šš¾) be the vector of all possible distinct values of a random variable ā ©, and let š = (šā, ..., šš¾) be the associated vector of probabilities
šā = š(ā © = šā | š), Ī£šā=1.
Let š„Ā = (š„ā, ..., š„ā) be a random sample of size š from the distribution of ā © and šĢ = (šĢā, ..., šĢš¾) the vector of the observed proportions for š. To make Bayesian inference on š, we use a symmetric Dirichlet distribution as the prior: š ā¼ š·šš(Ī±1), where Ī± > 0 is a constant (model parameter) and 1 denotes the vector of 1s. Then the posterior distribution is š|š„ ā¼Ā š·šš(Ī±1+ššĢ). Let Ī± ā 0 then we obtain an improper prior, which gives the posterior distribution
š|š„ ā¼Ā š·šš(ššĢ).
Now the bootstrap distribution obtained by sampling with replacement from the data š„Ā = (š„ā, ..., š„ā) can be expressed as sampling the category proportions from a multinomial distribution. Specifically, the observed proportions šĢ* from a bootstrap sample š„*Ā = (š„ā*, ..., š„ā*) has the following distribution
šĢ* ā¼Ā [šš¢šš”(š, šĢ)]/š,
where šš¢šš”(š, šĢ) denotes the multinomial distribution with parameters š and šĢ.
The bootstrap distribution of šĢ* is very similar to the posterior distribution of š|š„. In fact, the two distributions have the same mean and nearly the same variance-covariance matrix.
Since any parameter Ļ(š, š) associated with the distribution of ā © is completely determined by š, the bootstrap distribution of Ļ(šĢ*, š) will closely approximate the posterior distribution of Ļ(š, š).
Note that the bootstrap distribution is obtained without the need to formally specify a prior or sample from a posterior distribution, which is why it is sometimes referred to as a "poor man's" Bayesian posterior. By perturbing the data, the bootstrap method approximates the Bayesian effect of perturbing the parameters, usually in a simpler and more accessible manner.
Comments