Sequence-to-Sequence

IntermediateLanguage AI

Last updated June 14, 2026

What is Sequence-to-Sequence in simple terms?

In simple terms, sequence-to-sequence takes a whole input sequence and turns it into a whole new one. Like an interpreter who listens to your entire sentence before speaking it back in another language, rather than swapping word for word.

What is Sequence-to-Sequence?

Sequence-to-sequence is a family of machine learning approaches that takes one ordered sequence as input — such as a sentence — and produces another ordered sequence as output, even when the two differ in length, using one component to read the whole input and another to generate the result.

A lot of useful tasks are really about turning one ordered sequence into another. Translating a sentence turns an English word-sequence into a French one. Summarizing turns a long sequence of sentences into a short one. Answering a question turns a sequence of words into another sequence of words. What makes these hard is that the input and output often don't line up neatly: they can be different lengths, and the order of ideas can shift. You can't just swap each input item for an output item one at a time. Sequence-to-sequence (often written "seq2seq") is the approach built for exactly this — it reads an entire input sequence first, then generates an output sequence of whatever length the task needs.

The classic design splits the work between two parts: an encoder and a decoder. The encoder reads the whole input and compresses its meaning into an internal summary — a compact representation of "what was said." The decoder then takes that summary and generates the output sequence one item at a time, each new item informed by the summary and by what it has already produced. This "read it all, then write it all" structure is the key idea: by digesting the full input before committing to any output, the model can handle the reordering and length changes that trip up word-for-word methods. An interpreter works the same way — they wait for the whole sentence before speaking, because the right translation of an early word can depend on a word that comes later.

Sequence-to-sequence began as a major advance in machine translation and quickly spread to summarization, question answering, speech transcription, and more. Early versions used recurrent networks for the encoder and decoder; a key later refinement, attention, let the decoder look back at relevant parts of the input as it wrote, rather than relying on a single fixed summary — and that idea fed directly into the transformer architecture behind today's large language models. So it's fair to say sequence-to-sequence is less a single fixed model than a *shape* of problem and solution: input sequence in, output sequence out. The honest caveat is that the quality depends entirely on the underlying model and its training; the framework itself just defines the task, not how well it's done.

Real-world example of Sequence-to-Sequence

A customer-support team wants to turn long, rambling support emails into a one-line summary that pops up next to each ticket. This is a clean sequence-to-sequence job: the input sequence is the full email, the output sequence is a short summary, and the two are very different lengths. A seq2seq model reads the entire email first — encoding "frustrated customer, double-charged, wants a refund, mentions order number" into an internal summary — and only then generates the condensed line: "Double-charge refund request, order #4471." It doesn't shorten word by word; it digests the whole message and produces a fresh, compact version. That same read-the-whole-thing-then-rewrite-it move powers the translation and summarization tools people use every day.

Related terms

Frequently asked questions about Sequence-to-Sequence

What is the difference between sequence-to-sequence and a transformer?

They're related but not the same kind of thing. Sequence-to-sequence describes the *task shape* — take one sequence in, produce another sequence out — and the encoder-decoder structure for doing it. A transformer is a specific neural-network *architecture*. The two overlap: transformers are now the dominant way to *build* sequence-to-sequence systems, and the attention mechanism central to transformers grew out of seq2seq research. So a modern translation model is often both: it tackles a sequence-to-sequence task using a transformer architecture. Seq2seq is the job and broad approach; the transformer is one powerful engine for it. **2. Mechanism — How does sequence-to-sequence work?**

How does sequence-to-sequence work?

It typically uses two components. An encoder reads the entire input sequence and compresses its meaning into an internal representation. A decoder then generates the output sequence one item at a time, each step guided by that representation and by the items it has already produced. Reading the whole input before writing any output is what lets the model handle length differences and reordering between input and output. Modern versions add attention, which lets the decoder focus on the most relevant parts of the input at each step instead of leaning on a single fixed summary. **3. Application — What is sequence-to-sequence used for?**

What is sequence-to-sequence used for?

It's used wherever one ordered sequence must become another: machine translation (its original breakthrough), text summarization, question answering, speech-to-text transcription, and grammar correction, among others. More broadly, the encoder-decoder idea and the attention mechanism it popularized underpin much of modern language AI, including the architecture behind today's large language models. Anytime the job is "given this whole sequence, produce that whole sequence," sequence-to-sequence is the framing that fits.