Learn ▼

Autumn 2023 model calibration update

Evaluating PolicyEngine's model performance with the latest official statistics.

By nikhil woodruff

4 December 2023

3 min read

Contents

Model overview

How PolicyEngine differs from other models

Validation

Model overview#

PolicyEngine UK is a static microsimulation model- it does not (yet) incorporate behavioural responses like labour supply reactions to policy changes. Instead, we assume that households do not change their behaviour in response to policy changes, and that the only way that policy changes affect households is through their direct effects on household incomes.

To estimate the direct effects of policy changes, we apply the actual policy rules as specified in legislation to each of a large (tens of thousands) survey of UK households. We can then change the rules, and see how the totals of different variables change. For example, we could change the personal tax allowance from £12,570 to £15,000 and aggregate the tax payments before and after the policy change to estimate how much more tax is collected from the households in our survey.

The model is written in Python, and you can follow all of our real-time development on GitHub. Other models exists maintained by other organisations which use the same microsimulation approach: the IFS' TAXBEN, UKMOD at the University of Essex, and, the IPPR model, and internal models at HMRC and DWP. However, only PolicyEngine UK and UKMOD publish their policy implementation details and validation statistics.

How PolicyEngine differs from other models#

PolicyEngine's core approach to estimate policy impacts is the same as other static microsimulation models. However, we use a novel data science-based approach to improve the accuracy of the model's outputs significantly compared to other models (where we have been able to compare).

Microsimulation models are widely used by researchers to estimate policy impacts (questions for which we don't know the answer). But when we attempt to validate the models by asking them questions for which we do know the answer (for example, total Income Tax revenue in 2021), we often find that the model answers are significantly different from the ground truth. This problem is large and exists in every microsimulation model that publishes details of attempts to measure it.

Assuming that the policy implementations in the model are correct (while the law is complex and we cannot test every possible household, we publish and meet hundreds of automated tests on every version update), the most likely explanation for this is that the model's survey data is not representative of the population: the model's outputs are only as good as the data that we feed into it.

We have adopted an approach to reduce this problem by using machine learning techniques to improve the survey's accuracy by using data from other trusted sources: OBR, HMRC, DWP, ONS and others. We essentially do the following:

Take the initial survey data
Add synthetic households (using other microdata) and previous-year households with zero weight
Collect trusted external statistics describing tax-benefit and demographic properties of the UK
Train a machine learning model adjust the weights of the survey to best fit those external statistics

The resulting weighted survey powers PolicyEngine's impact estimates.

Validation#

PolicyEngine meets tax-benefit and demographic totals closely, and estimates program impacts over a five-year horizon. For example, the chart below shows our projections for three key benefits: Child Benefit, Housing Benefit and Universal Credit.

Figure 1: PolicyEngine UK's projections for three key benefits

But how does PolicyEngine align with the best estimates of the ground truth? We can compare PolicyEngine's estimates with two other sources to estimate how our data enhancement approach performs: the original survey data, and official statistics and projections from government. Shown below is, for each calendar year in the budget horizon, how the relative errors in tax-benefit-related statistical targets become better or worse. Over 80% of these targets improve.

Figure 2: Relative errors in tax-benefit-related statistical targets over the budget horizon

We've also made all our calibration validation results available in an interactive dashboard, which is available on GitHub here (screenshot below). We welcome feedback or comments on our approach- feel free to get in touch.

Figure 3: PolicyEngine UK's calibration validation dashboard

nikhil woodruff
PolicyEngine's Co-founder and CTO

Subscribe to PolicyEngine

Get the latests posts delivered right to your inbox.

PolicyEngine is a registered charity with the Charity Commission of England and Wales (no. 1210532) and as a private company limited by guarantee with Companies House (no. 15023806).