This is a quick introduction to simple, rejection approximate Bayesian compution and more useful sequential methods (because I needed to make the diagrams for a poster).
Simple Rejection ABC
Many of the tools for inference about a parameter set \(\Theta\) given observed data \(x\) require the ability to evaluate the likelihood \(f(\Theta | x)\). When this likelihood is analytically or computationally intractable, approximate Bayesian computation (ABC) allows for likelihood-free inference when the model can be simulated from, and is a popular tool in areas of genetics, climate study and ecology.
The simplest implementation of ABC is a rejection sampler (Pritchard et al. 1999), with schematic diagram shown in Fig 1.
The rejection ABC algorithm is summarised as:
- A set of \(N\) “particles” (i.e. parameter sets) are proposed from the prior distribution.
- Each particle is used to generate a simulated dataset under the model (\(x_i \sim f(x_i~|\Theta_i)\)).
- Each simulation is compared with the observed data, \(x\). This comparison takes the form of a ‘distance’ - in the above schematic the distance is colour similarity.
- Particles are accepted based on their corresponding simulation’s distance - either by setting a tolerance threshold for the distance and accepting all particles with smaller distance than this, or by accepting the \(\alpha\) proportion of particles with the smallest distance (i.e. the set of particles that led to the ‘best’/most similar simulations).
- The accepted sample of parameters give an approximation to the target posterior distribution.
The performance of ABC is obviously very closely related to the choice of the distance function, \(\delta(x_i, x)\), between simulated data \(x_i\) and observed data \(x\). Usually, this choice is the Euclidean distance between a set of summary statistics evaluated at \(x\) and \(x_i\) - in the above, summary being “colour”. I’m being purposefully vague on this point, and recommend Prangle (2015) and Prangle (2017) for more information on summary statistics.
The simple ABC method often leads to low acceptance rates of parameters, or the acceptance of parameters with large distances from the observed data resulting in a poor approximation to the posterior. This is especially so when the prior distribution is very different from the true posterior. Rather than completing a single step from prior to posterior that would require a computationally inefficient number of proposed particles and corresponding simulations, sequential methods have been proposed that iterate over a sequence of distributions from the prior that get ever ‘closer’ to the posterior. The idea here is to have a higher acceptance at each step than would be required by the simple rejection method, with each step focusing further on the regions of parameter space with high likelihood, with the hope of reducing the computational burden by requiring fewer iterations overall than simple rejection.
A number of sequential ABC methods have been proposed based on the same idea that differ slightly in implementation. Sisson, Fan, and Tanaka (2009) and Toni et al. (2009) require a set of decreasing tolerance levels for the distance function at each step to be specified, which is unideal as the appropriate levels are difficult to ascertain. Drovandi and Pettitt (2011) and Del Moral, Doucet, and Jasra (2012) do not require this, and instead estimate an appropriate tolerance level at each step, however these methods can lead to the duplication of particles.
I’m going to focus on the Lenormand, Jabot, and Deffuant (2013) approach - because that’s the version I’m using and have therefore made a diagram of. This approach has an automated tolerance level that is based on accepting the best sample (of constant size) of particles at each step, but differs from the above approaches because particles from previous steps get taken through to future steps if they are still within that set of “best” particles. This is an attractive feature, computationally, as you retain particles with high likelihood. Additionally, the other methods described require repeated simulations at each step until a certain number of particles has been accepted. In Lenormand, Jabot, and Deffuant (2013) the number of simulations at each step is constant, and so the computational cost of another step is always known, which is a nice feature if (more than likely) the simulation is computationally burdensome.
The ‘adaptive population Monte Carlo (APMC) ABC sampler’ is shown in the schematic of Fig 2.
- An initial step the same as simple rejection ABC is carried out, retaining the \(\alpha N\) particles with smallest distance to the observed data.
- Each iterative step then consists of taking the set of retained particles and taking a weighted resample of \((1-\alpha)N\) particles. This sample is perturbed (usually by a normal random walk with variance determined by the variation in the current retained particle set) to give the set of proposed particles.
- As in simple ABC, the proposed particles are used to simulate data from the model.
- The simulated data from the retained particles and the new proposed particles are combined, compared with the observed data, and the \(\alpha N\) particles from this set of \(N\) with the smallest distance are retained. This forms the new “best” set of particles and corresponds to a tolerance level of the \(\alpha\) percentile of the distances (therefore guaranteed to decrease at each step).
- The set of weights for the retained particles that were proposed this step are calculated. This weight is based on the weight from an importance sample, i.e. how likely it was to propose that particular particle.
- These steps are repeated until a threshold is met regarding the number of new particles being accepted over the ones retained from the previous step - i.e. when the posterior is not changing because new particles aren’t being accepted, there is little to be gained from further steps.
Del Moral, P, A Doucet, and A Jasra. 2012. “An adaptive sequential Monte Carlo method for approximate Bayesian computation.” Statistics and Computing 22 (5): 1009–20. doi:10.1007/s11222-011-9271-y.
Drovandi, CC, and AN Pettitt. 2011. “Likelihood-free Bayesian estimation of multivariate quantile distributions.” Computational Statistics and Data Analysis 55 (9). Elsevier B.V.: 2541–56. doi:10.1016/j.csda.2011.03.019.
Lenormand, M, F Jabot, and G Deffuant. 2013. “Adaptive approximate Bayesian computation for complex models.” Computational Statistics 28: 2777–96. http://arxiv.org/abs/1111.1308.
Prangle, D. 2015. “Summary Statistics in Approximate Bayesian Computation.” arXiv, 1–25. http://arxiv.org/abs/1512.05633.
———. 2017. “Adapting the ABC distance function.” Bayesian Analysis 12 (1): 289–309. doi:10.1214/16-BA1002.
Pritchard, JK, MT Seielstad, A Perez-Lezaun, and MW Feldman. 1999. “Population growth of human Y chromosomes: a study of Y chromosome microsatellites.” Molecular Biology and Evolution 16 (12): 1791–8. doi:10.1093/oxfordjournals.molbev.a026091.
Sisson, SA, Y Fan, and MM Tanaka. 2009. “Correction for Sisson et al., Sequential Monte Carlo without likelihoods.” Proceedings of the National Academy of Sciences 106 (39): 16889–9. doi:10.1073/pnas.0908847106.
Toni, T, D Welch, N Strelkowa, A Ipsen, and MPH Stumpf. 2009. “Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems.” Journal of the Royal Society Interface 6 (July 2008): 187–202. doi:10.1098/rsif.2008.0172.