PolicyEngine logo
menu
Research
About
Donate
Learn ▼

Autumn 2023 model calibration update

Evaluating PolicyEngine's model performance with the latest official statistics.

By nikhil woodruff

4 December 2023

3 min read

Autumn 2023 model calibration update

Contents

Model overview

How PolicyEngine differs from other models

Validation

PolicyEngine's free and open-source microsimulation model estimates the budgetary, distributional and poverty impacts of UK tax and benefit reforms by simulating the full details of policy over a large representative dataset of UK households. In this post, we'll provide a brief overview of how PolicyEngine UK's microsimulation model works, and an update on how we maintain and validate the model's accuracy.

Model overview#

PolicyEngine UK is a static microsimulation model- it does not (yet) incorporate behavioural responses like labour supply reactions to policy changes. Instead, we assume that households do not change their behaviour in response to policy changes, and that the only way that policy changes affect households is through their direct effects on household incomes.

To estimate the direct effects of policy changes, we apply the actual policy rules as specified in legislation to each of a large (tens of thousands) survey of UK households. We can then change the rules, and see how the totals of different variables change. For example, we could change the personal tax allowance from £12,570 to £15,000 and aggregate the tax payments before and after the policy change to estimate how much more tax is collected from the households in our survey.

The model is written in Python, and you can follow all of our real-time development on GitHub. Other models exists maintained by other organisations which use the same microsimulation approach: the IFS' TAXBEN, UKMOD at the University of Essex, and, the IPPR model, and internal models at HMRC and DWP. However, only PolicyEngine UK and UKMOD publish their policy implementation details and validation statistics.

How PolicyEngine differs from other models#

PolicyEngine's core approach to estimate policy impacts is the same as other static microsimulation models. However, we use a novel data science-based approach to improve the accuracy of the model's outputs significantly compared to other models (where we have been able to compare).

Microsimulation models are widely used by researchers to estimate policy impacts (questions for which we don't know the answer). But when we attempt to validate the models by asking them questions for which we do know the answer (for example, total Income Tax revenue in 2021), we often find that the model answers are significantly different from the ground truth. This problem is large and exists in every microsimulation model that publishes details of attempts to measure it.

Assuming that the policy implementations in the model are correct (while the law is complex and we cannot test every possible household, we publish and meet hundreds of automated tests on every version update), the most likely explanation for this is that the model's survey data is not representative of the population: the model's outputs are only as good as the data that we feed into it.

We have adopted an approach to reduce this problem by using machine learning techniques to improve the survey's accuracy by using data from other trusted sources: OBR, HMRC, DWP, ONS and others. We essentially do the following:

  1. Take the initial survey data
  2. Add synthetic households (using other microdata) and previous-year households with zero weight
  3. Collect trusted external statistics describing tax-benefit and demographic properties of the UK
  4. Train a machine learning model adjust the weights of the survey to best fit those external statistics

The resulting weighted survey powers PolicyEngine's impact estimates.

Validation#

PolicyEngine meets tax-benefit and demographic totals closely, and estimates program impacts over a five-year horizon. For example, the chart below shows our projections for three key benefits: Child Benefit, Housing Benefit and Universal Credit.

Figure 1: PolicyEngine UK's projections for three key benefits

131451141460131274131185131185Child BenefitHousing BenefitUniversal Credit010B20B30B40B50B60B70B80B
Calendar year20232024202520262027Budgetary impact (£)

But how does PolicyEngine align with the best estimates of the ground truth? We can compare PolicyEngine's estimates with two other sources to estimate how our data enhancement approach performs: the original survey data, and official statistics and projections from government. Shown below is, for each calendar year in the budget horizon, how the relative errors in tax-benefit-related statistical targets become better or worse. Over 80% of these targets improve.

Figure 2: Relative errors in tax-benefit-related statistical targets over the budget horizon

-97%-97%-96%-95%-95%-90%-93%-88%-89%-88%-81%-87%-81%-73%-74%-69%-73%-62%-60%-61%-53%-49%-42%-46%-46%-31%-35%-23%-15%-28%-7%-9%+16%+15%+2%+40%+35%+80%+75%+58%+145%+139%+205%+190%+154%20232024202520262027−100%−50%+0%+50%+100%+150%+200%
Quantile0.10.20.30.40.50.60.70.80.9Calendar yearChange in relative error

We've also made all our calibration validation results available in an interactive dashboard, which is available on GitHub here (screenshot below). We welcome feedback or comments on our approach- feel free to get in touch.

Figure 3: PolicyEngine UK's calibration validation dashboard

Figure 3: PolicyEngine UK's calibration validation dashboard

PolicyEngine logo

Subscribe to PolicyEngine

Get the latests posts delivered right to your inbox.


PolicyEngine is a registered charity with the Charity Commission of England and Wales (no. 1210532) and as a private company limited by guarantee with Companies House (no. 15023806).

© 2025 PolicyEngine. All rights reserved.