Encoder
The part of a model that reads an input sequence and produces a fixed-size or per-token vector representation for downstream use.
In one line
The half of a model that turns raw input into vectors — no generation, just representation.
What it actually means
In an encoder-decoder transformer, the encoder takes the input (say a source sentence), runs it through a stack of self-attention + FFN blocks, and outputs one vector per input token. Those vectors get cross-attended by the decoder. In an encoder-only model like BERT, you skip the decoder entirely — the encoder output is the whole point. You use those vectors for classification (feed the [CLS] token to a linear head), embedding/search (pool the token vectors), or named-entity recognition (per-token classification). Encoder-only models are bidirectional: every token can attend to every other token, which is why BERT was better than GPT-era models at tasks where you have the full input up front.
Why it matters
Dense retrieval, reranking, embedding models, and most production classifiers run on encoder-only transformers. They’re smaller, faster, and better at representation than decoder-only LLMs for these tasks. If you’re building search, a BGE or E5 encoder beats calling GPT-4 for embeddings on cost and latency.
Example
from sentence_transformers import SentenceTransformer
enc = SentenceTransformer("BAAI/bge-small-en-v1.5")
vecs = enc.encode(["how do I reset my password?"])
You’ll hear it when
- Picking an embedding model for a vector database.
- Discussing encoder-only vs decoder-only vs encoder-decoder.
- Training a classifier on top of BERT-style backbones.
- Comparing cross-encoders and bi-encoders for reranking.