Simulating Correlated Bernoulli Data

Andrew Yan

Jul 25, 20241 min read

Correlated Bernoulli data are quite common in medical research and clinical trials. In most cases, the correlations between the binary responses can be ignored without significantly affecting statistical inferences. However, in situations such as power and sample size calculations involving multiple binary endpoints, it is very important to account for the potential correlations in the calculations, which often require trial simulations. This post introduces the bivariate Bernoulli distribution with prespecified correlation and marginal means.

Practically, the parameters 𝑝, 𝑞 and ρ are often estimated from historical data. The paired data (Ⅹ, Υ) can then be generated from the multinomial distribution with parameters 𝑝₁, 𝑝₂, 𝑝₃ and 𝑝₄ as determined by Equations (1)-(4).

Multivariate Bernoulli distributions for three or more variables can be derived using more general approaches, which are often difficult to implement. Additionally, different joint distributions with the same marginal means and pairwise correlations can be obtained depending on the approach used. This is because there are more free parameters in the joint distribution than the total number of parameters (means and pairwise correlations) associated with the marginal distributions. In the case of three variables, for example, there are 2³ - 1 = 7 free parameters in the joint distribution but only 6 parameters from the three marginal distributions.

Simulating Correlated Bernoulli Data

Recent Posts

Kommentare

Contact