Our models are designed to make predictions on future events using data from the past, so we always need to “validate” them. This means testing how accurately the models predicts real-world situations outside of the safe, cosy environment it’s been conceived and built in. More technically, this is called “out-of-sample” performance. In an ideal world, the data used to train our models would be a perfect representation of what it will encounter when it needs to produce critical results in the future. However, we all understand intuitively that just because things happened a certain way in the past, it’s no guarantee that things will turn out exactly that way in the future.
A key part of a data scientist’s job when building unbiased models is leveraging the complexity and detail of events present in the data we have access to, while producing a model which is able to reliably predict future events which won’t follow exactly the same pattern. Model validation is an important tool to help us test how well a model predicts on new data, or warn us when we’re mapping too closely to unique events in the past (called “overfitting”).
Our SEND model is increasingly being used and heavily relied upon by local government leaders to shape and support large investment and significant policy decisions, which makes it all the more essential for us to be able to demonstrate its trustworthiness and for our clients to have total confidence in its predictions.
How we are validating the SEND model’s results
To evaluate our out-of-sample performance we use a complete set of real historic data spanning multiple years and apply a type of cross-validation known as “forward chaining”. With this approach, we give the model a year of historic training data to learn from and ask it to make predictions for the following year. We then compare these predictions with the true values for our historic dataset. The key thing is the model never gets to “see” the real values for the year it’s making a prediction for. This simulates reality, where we only have data to give to our model on SEND populations up to the current school year, but we want to know how accurate its predictions are in the future.
We repeat the forward chaining process for each year of historic SEND information we have, starting with the first year and rolling through the dataset until the final year. Many Local Authorities are able to provide four years’ data, so we’ve used four years to test on. The diagram below shows how we partition the dataset into separate years. We then pass the first year of data to the model to learn from and compare its predictions with the true values for years two, three and four. The process is then repeated, using years one and two for training and testing the predictions for years three and four. Finally we train the model on years one to three of the data and test the outputs for year four. We average the different test results to get a robust indication of how well the model is performing.
What do the results look like?
Overall, we’re finding that we can predict the total future SEND population of an authority to over 99% accuracy. This is significantly better than the optimal performance of simpler modelling methods such as linear regression, which look at total populations – we see many of our clients using these methods, often implemented in Excel, as a benchmark against our more complex modelling approach. The chart below shows a comparison of results for our model vs linear regression, with both methods being asked to predict total SEND population one year ahead at three separate points using real historical data from a local authority.
Of course we are pleased to see that this shows our methods to be more accurate than a standard linear regression. However, more importantly, our more complex modelling method gives a detailed breakdown of pupil numbers by setting, need, and age, which are essential for many of the policy scenarios that our clients need to think about. This means our model is achieving excellent overall accuracy, combined with a capability to dive far deeper into granular trends than standard methods can offer.
Running the validation for individual need and setting types has also helped us to identify individual cases, such as a specific need in an academic year, where policy changes and local SEND provision decisions have changed over the course of the input data. We’ve been able to spot these changes, discuss them with the teams using the results, and account for them in the model to make even more reliable future predictions.
Get in touch at firstname.lastname@example.org if you’d like to know more about how all of this works.
Share this article