Learning Rate
Last updated June 14, 2026
What is Learning Rate in simple terms?
In simple terms, the learning rate is how big a step an AI takes each time it corrects a mistake while training. Big steps move fast but can overshoot; tiny steps are careful but slow.
What is Learning Rate?
The learning rate is a setting that controls how big an adjustment a machine learning model makes each time it corrects itself during training — a small but pivotal hyperparameter that decides whether the model learns smoothly, too slowly, or never settles at all.
When a machine learning model trains, it improves in a long series of tiny corrections: it makes a guess, sees how wrong it was, and nudges its internal settings to do a little better next time. The learning rate is the single number that decides how big each of those nudges is. Make it large and the model takes bold steps, changing a lot at once; make it small and it inches forward cautiously. It's one of the most important hyperparameters — settings you choose before training rather than ones the model learns — because that one number heavily shapes whether training works at all.
The classic way to picture it is walking down into a valley in thick fog, trying to reach the lowest point — which stands for the model's best, least-wrong state. The learning rate is the size of your stride. Take huge strides and you cover ground fast, but near the bottom you keep leaping clean over the lowest point and bouncing between the slopes, never settling — the model never lands on a good answer. Take tiny, shuffling steps and you'll get there eventually, but it could take painfully long, and you might get stuck in a shallow dip along the way, mistaking it for the bottom. The sweet spot is a stride big enough to make real progress but small enough to settle gently at the lowest point. This is the same downhill-walking idea behind gradient descent, the method that decides which direction to step; the learning rate decides how far.
Because the ideal value depends on the model and the data and isn't obvious in advance, choosing the learning rate is a central part of tuning. A value that's too high is often dramatic — training "blows up" and the model gets worse, not better — while one that's too low just wastes time. In practice, people rarely keep it fixed: a common and effective approach is to start with larger steps to cover ground quickly, then gradually shrink them as training goes on, so the model settles precisely at the end. This shrinking schedule is called learning rate decay. Get this one setting right and training is smooth and efficient; get it badly wrong and even a well-designed model may never learn properly — which is why it's so often the first dial practitioners reach for.
Real-world example of Learning Rate
Think of tuning a guitar by ear toward a target note. If you crank the tuning peg in big, aggressive turns, you swing wildly past the note — too sharp, then over-correct to too flat, then sharp again — and the string never quite lands in tune. If you turn it in almost imperceptible amounts, you'll get there perfectly but it takes forever. What a good musician does is turn boldly while the note is far off, then make smaller and smaller adjustments as it gets close, easing onto the exact pitch. That's precisely how a well-set learning rate behaves: large, productive steps early on, then progressively gentler ones to settle on the right answer — and the entire difference between a quick, clean tuning and an endless, frustrating one is how big you let those turns be.
Related terms
Frequently asked questions about Learning Rate
What is the difference between the learning rate and the number of epochs?
Both are hyperparameters set before training, but they control different things. The learning rate is how big a step the model takes each time it adjusts itself — the size of each correction. The number of epochs is how many times the model passes over the whole training dataset — essentially how long it trains. One governs the size of each move, the other the total amount of training. They interact: a tiny learning rate may need many more epochs to reach a good result, while too large a learning rate can fail no matter how many epochs you allow.
How does the learning rate work?
During training, the model repeatedly works out the direction that would reduce its error, then moves its internal settings in that direction. The learning rate is multiplied into that move to decide its size: a large value makes a big jump, a small value a cautious step. Too large and the model overshoots the best answer and never settles; too small and it crawls or gets stuck. Because the best value isn't obvious, it's tuned by trial, and it's common to decay it over training — starting with larger steps and shrinking them — so the model makes fast early progress then settles precisely.
What is the learning rate used for?
The learning rate is used to control the speed and stability of training, and tuning it is one of the most important jobs in building almost any machine learning model. Set well, it lets a model learn quickly and settle on a good result; set poorly, it can make training fail outright or drag on needlessly. Because it has such an outsized effect for a single number, it's typically the first hyperparameter practitioners adjust, often paired with a schedule that shrinks it as training progresses for the best of both speed and precision.