Contents
The UK Survey of Personal Incomes
Validation results
Analysing model discrepancies
Conclusion
To ensure accuracy and precision, PolicyEngine periodically validates its free and open-source simulation and microsimulation model against existing external datasets. This process involves taking external datasets meant to represent a particular slice of British society and loading their inputs, such as total individual employment income, into our data model. We then compare our calculated outputs with theirs in an effort to find discrepancies between the two. In this post, we'll describe our efforts to validate our model against the newly released
The Survey of Personal Incomes (SPI) is a comprehensive dataset produced by HM Revenue and Customs (HMRC). It provides detailed information on individuals' income, tax, and other financial data based on a sample of HMRC records. The SPI is among the most comprehensive UK individual-level financial datasets, and is used by HMRC, HM Treasury, and individual Members of Parliament, among others.[^1]
We chose to validate PolicyEngine against the SPI's total individual income, using the SPI's input variables to compare PolicyEngine UK's calculated total income tax against the SPI's reported amount.
To do so, we first remove the SPI's "composite records," which are used to represent smaller population groups as a collective. This group of records is only meant to be employed within society-wide modelling, and its outputs cannot be correctly computed using its inputs. This group of records contains 1,770 observations, representing around 0.2% of total records.
We then treat relevant variables from the SPI 2020/21 dataset[^2] as input variables into the PolicyEngine UK model. Using this data, we run household-level simulations for each input and categorise the outputs logarithmically based on how close PolicyEngine's calculated 2020/21 total income tax is to that of the SPI. We treat a result within £10 as a match due to differences in rounding. Our final match statistics are displayed in the table below.
For 92.7% of individuals included within the SPI, PolicyEngine UK's calculated total income tax is within £10 of the SPI's calculated amount. We view this as a robust demonstration of PolicyEngine's accuracy and precision.
For the remaining records where our calculations do not closely match the SPI, we believe the differences are attributable to two main factors:
Four relevant error clusters appeared among the records where the SPI and PolicyEngine calculate differing outputs:
PolicyEngine's UK model shows a high level of parity with the SPI's 2020/21 data, with almost 93% of records with the same inputs returning the same total income tax calculation. While some discrepancies exist, they are limited in scope and often attributable to either testability limitations within the SPI's structure or to inputs that the SPI has not made publicly available. This validation exercise builds upon our previous validations, such as our
anthony volk
Full-Stack Engineer at PolicyEngine
Get the latests posts delivered right to your inbox.
PolicyEngine is a registered charity with the Charity Commission of England and Wales (no. 1210532) and as a private company limited by guarantee with Companies House (no. 15023806).
© 2025 PolicyEngine. All rights reserved.