Long Short-Term Memory (LSTM)

AdvancedDeep Learning

Last updated June 14, 2026

What is Long Short-Term Memory in simple terms?

In simple terms, long short-term memory is a sequence-reading AI with a better memory — like keeping a notebook as you read, jotting down what matters, updating it, and crossing out what no longer does.

What is Long Short-Term Memory?

Long short-term memory (LSTM) is a type of recurrent neural network designed to remember information over long sequences, using built-in "gates" that decide what to keep, what to update, and what to discard — overcoming the tendency of plain recurrent networks to forget earlier inputs.

Long short-term memory (LSTM) is a particular kind of recurrent neural network — a network built to process sequences like text or speech one step at a time while carrying a memory forward. It was created to fix a specific, stubborn weakness in plain recurrent networks: their memory of early inputs fades as a sequence grows longer, so by the end of a long passage they've effectively forgotten how it began. That's a serious problem for language, where a detail at the start of a paragraph often matters for understanding the end. LSTM addresses this with a smarter internal memory that can deliberately hold on to important information across long stretches — which is exactly what the slightly odd name is getting at: it gives the network a *long-lasting* version of its *short-term* working memory.

The clever part is *how* it manages that memory: through small internal mechanisms usually called gates. Rather than blindly cramming every new input into its memory and letting old information get crowded out, an LSTM makes deliberate decisions at each step — roughly, what to throw away from its memory, what new information to add, and what to read out as its current answer. You can picture it as reading a long report with a notebook beside you. You don't try to hold every sentence in your head; instead you jot down the points that matter, update them as the report develops, and cross out notes that no longer apply. By the end, your notebook still holds the important threads from the very beginning, because you actively chose to keep them. An LSTM's gates do the same job — selectively keeping, updating, and forgetting — which is what lets useful information survive across a long sequence.

For a long stretch, LSTMs were the workhorse behind serious sequence tasks — machine translation, speech recognition, text generation — precisely because that durable memory let them handle longer, more complex inputs than plain recurrent networks could. They mark a genuine milestone in the history of getting machines to work with language. As with recurrent networks generally, the transformer architecture has since overtaken LSTMs for most large-scale language work, because it can take in a whole sequence at once and connect distant parts directly, rather than passing memory along step by step — and it trains far faster as a result. Even so, LSTMs are still used, especially for certain time-series and smaller sequence problems, and understanding them is one of the clearest ways to grasp how the idea of "memory" in a neural network actually works.

Real-world example of Long Short-Term Memory

Imagine a court stenographer following a long, winding testimony and producing a faithful written record. A witness mentions a specific date early on, rambles through ten minutes of unrelated detail, then refers back to "that day" near the end — and the record has to connect the two correctly. A stenographer manages this not by holding every single word in their head, but by keeping organized notes: locking in the key fact (the date), letting the irrelevant rambling pass without cluttering those notes, and pulling the date back out when it becomes relevant again. An LSTM handles a long sequence in just this spirit — deliberately retaining the details that will matter later, ignoring the noise in between, and surfacing the right information at the right moment. That selective, note-keeping memory is exactly what lets it bridge the long gap between the date mentioned early and the reference to it much later.

Related terms

Frequently asked questions about Long Short-Term Memory

What is the difference between long short-term memory and a plain recurrent neural network?

An LSTM is a specific, more capable type of recurrent neural network. A plain recurrent network carries a single running memory forward and tends to lose track of early information as a sequence gets long — its memory simply fades. An LSTM adds internal "gates" that actively decide what to keep, update, and discard, which lets it hold important information across much longer stretches. So they share the same basic step-by-step, memory-carrying design, but the LSTM's gated memory is built to overcome the forgetfulness that limits the simpler version on long sequences.

How does long short-term memory work?

An LSTM processes a sequence one step at a time, like any recurrent network, but it maintains a memory that it manages with internal gates. At each step, these gates decide what to remove from the memory, what new information to write into it, and what to output as the current result. Because the network actively curates its memory rather than just overwriting it, useful information from early in the sequence can be preserved for a long time instead of fading away. This deliberate keep-update-forget control is the core mechanism that lets LSTMs handle long-range dependencies that plain recurrent networks miss.

What is long short-term memory used for?

LSTMs are used for sequence tasks where information has to be remembered across long stretches: machine translation, speech recognition, text generation, handwriting recognition, and time-series prediction such as forecasting from sensor or financial data. For years they were the leading approach to much of this work. Today, transformers have replaced them for most large-scale language tasks, but LSTMs remain useful for certain time-series and smaller-scale sequence problems, and they're a key step in understanding how neural networks can be given a durable, working memory.