Dropout
Last updated June 14, 2026
What is Dropout in simple terms?
In simple terms, dropout trains a neural network with random parts switched off each time, so it can't lean on any one too heavily — like a team that drills with random players sitting out.
What is Dropout?
Dropout is a technique for training neural networks that randomly switches off a fraction of the network's units on each training step, which discourages the network from over-relying on any single part and helps it generalize better to new data.
Dropout is a method for making neural networks better at handling new, unseen data, and the basic idea is almost playfully simple: during training, on each pass, you randomly turn off a portion of the network's units (its "neurons"), as if they weren't there. Which ones get switched off changes randomly every step, and a typical setting might drop, say, a portion of them at a time. The full network is still used once training is done — dropout happens only *while learning*. The point of this deliberate sabotage is to stop the network leaning too heavily on any single unit or any cozy little group of units that always fire together. If any neuron might vanish at any moment, the network is forced to spread the work around and build in redundancy, so no one part becomes a single point of failure.
The reason this is worth doing comes back to overfitting — the failure where a model learns its training data too closely, memorizing quirks instead of the general pattern, and then stumbles on anything new. Networks are prone to forming brittle co-dependencies, where certain neurons only work in concert and the network's success hinges on that exact arrangement, which happens to fit the training data and nothing else. Dropout breaks those fragile partnerships by making them unreliable: a neuron can't count on its usual partners being present, so each one has to learn to pull its weight more independently. This makes dropout a form of *regularization* — a family of techniques that gently discourage a model from over-fitting, pushing it toward the general rule rather than the specific examples.
The team analogy captures the spirit well. Imagine a coach who, at every training session, benches a random handful of players and makes the rest run the drills without them. Nobody knows in advance who'll be sitting out, so the team can't build its whole game plan around one irreplaceable star — every player has to become capable, and the squad develops genuine depth. Come match day, everyone plays, and the team is far more resilient than one that always practiced with the same fixed lineup and would collapse if the star were injured. Dropout does exactly this to a network: by randomly sidelining parts during training, it produces a network whose competence is spread out and robust, rather than precariously balanced on a few specialized parts — and that robustness is what helps it perform on data it has never seen.
Real-world example of Dropout
Think of an orchestra preparing for a concert with a conductor who has an unusual rehearsal habit. At each practice, she randomly asks a few players to stay silent for a passage — a couple of violinists here, an oboist there, different ones each time. Forced to keep the music whole without knowing who'll drop out next, the players learn to listen harder, cover for each other, and stop relying on any one person to carry a part. By concert night, when everyone plays together, the ensemble is remarkably resilient: a player having an off night doesn't unravel the performance, because the whole group learned to hold the piece together without leaning on any single chair. That randomized silencing in rehearsal — sidelining parts on purpose so the whole becomes sturdier — is precisely what dropout does inside a neural network during training.
Related terms
Frequently asked questions about Dropout
What is the difference between dropout and other regularization?
Dropout is one specific kind of regularization — the broad family of techniques that discourage a model from overfitting. What makes dropout distinctive is *how* it does it: by randomly switching off parts of a neural network during training, forcing the network to spread its learning out rather than depend on a few units. Other regularization methods reach the same goal differently — for example by penalizing overly large internal values, or by stopping training early before the model starts memorizing. They share the aim of better generalization; dropout's particular trick is the random "turning off" of units, which is specific to neural networks.
How does dropout work?
On each training step, dropout randomly selects a fraction of the network's units and temporarily ignores them, as if they'd been removed for that pass — and it picks a different random set each time. This stops any unit from relying on specific others always being present, so each learns to contribute more independently and the network builds in redundancy. Importantly, dropout is applied only during training; when the trained network is actually used, all the units are active. The net effect is a network that doesn't hinge on any fragile arrangement of parts, which makes it more robust on new data.
What is dropout used for?
Dropout is used to reduce overfitting and improve how well a neural network generalizes to data it hasn't seen — a common and valuable tool when training deep networks, which are otherwise prone to memorizing their training data. It's especially useful when there's a risk the network is large relative to the amount of training data available. By making the network's learning more distributed and less brittle, dropout helps produce models that hold up in the real world rather than only excelling on the examples they were trained on.