Cross-Validation
Last updated June 11, 2026
What is Cross-Validation in simple terms?
In simple terms, cross-validation is rotating who sits the test. You split your data into parts, train on most and test on the rest, then rotate so every part gets a turn being the exam.
What is Cross-Validation?
Cross-validation is a technique for testing how well a machine learning model will perform on new data by repeatedly splitting the available data into training and testing portions in different ways, then averaging the results for a more reliable estimate.
Cross-validation is a method for getting an honest estimate of how well a model will do on data it hasn't seen. The basic way to test a model is to hold back some data, train on the rest, and check the held-back portion. But that single split can be misleading — you might, by luck, hold back an unusually easy or unusually hard slice, and judge the model too kindly or too harshly. Cross-validation fixes this by splitting the data several different ways and testing on each, so the verdict doesn't hinge on one lucky or unlucky division.
The most common form splits the data into a handful of equal parts — often five. The model is trained on all but one part and tested on the part left out; then the process repeats, each time leaving out a different part, until every part has served once as the test. The results from all the rounds are averaged into a single score. Because every example gets used for both training and testing across the rounds, and the result is an average rather than a one-off, the estimate is far more trustworthy than a single split — especially when data is limited and a single test set would be too small to mean much.
Cross-validation is a staple technique in machine learning, used both to judge a model fairly and to choose between options. It's particularly valuable for detecting overfitting: a model that has merely memorized its training data will score well on the data it trained on but poorly across cross-validation's rotating tests, exposing the gap. It also helps when tuning a model's settings, by giving a reliable way to compare choices. Whenever someone needs to know how a model will really perform — not just how it looks on one convenient test — cross-validation is the standard tool for finding out.
Real-world example of Cross-Validation
A small medical research team has just 200 patient records and wants to build a model that predicts a particular condition. With so little data, a single train-and-test split is risky: hold back 40 records and the model might look great or terrible depending purely on which 40 happened to land in that group. So they use cross-validation. They divide the 200 records into five groups of 40, then train and test five times — each round training on 160 records and testing on the 40 it left out, until every patient has been in the test group exactly once. Averaging the five scores gives them a far steadier estimate of how the model will perform on future patients than any single split could. With precious little data, that rotation is how they wring a trustworthy answer out of what they have.
Related terms
Frequently asked questions about Cross-Validation
What is the difference between cross-validation and a single train-test split?
A single train-test split holds back one portion of the data for testing and uses the rest for training — quick, but the result depends heavily on which examples happen to land in that one test portion. Cross-validation repeats the split several different ways, rotating which portion is held back, and averages the results. This makes the performance estimate much more reliable, because it doesn't hinge on a single lucky or unlucky division. The trade-off is that cross-validation takes more computation, since the model is trained and tested multiple times instead of once.
How does cross-validation work?
The data is divided into several equal parts — five is common. The model is trained on all the parts but one and tested on the part left out; this repeats so that each part takes a turn as the test set, with the model retrained each round. The scores from every round are then averaged into one overall estimate. Because every example is used for both training and testing across the rounds, and the final figure is an average rather than a single result, the estimate reflects the model's real performance more faithfully than one split would.
What is cross-validation used for?
It's used to estimate honestly how well a machine learning model will perform on new, unseen data, and to compare options when building one. It's especially useful for spotting overfitting — a model that memorized its training data will do poorly across the rotating tests — and for tuning a model's settings reliably. It's particularly valuable when data is limited, since it makes the most of a small dataset by reusing every example for both training and testing rather than sacrificing a chunk to a single test set.