Attention Mechanism

In short

The core innovation inside the Transformer architecture. It lets the model decide which parts of the input are relevant to each other when making a prediction.

When you read a sentence like “The cat sat on the mat because it was tired,” your brain automatically connects “it” back to “the cat” and not “the mat.” You’re paying attention to the right words. That’s essentially what the attention mechanism does — it figures out which words (or Tokens) in the input should be paying attention to which other words.

Before attention came along, language models processed text in order — word by word, left to right — and had a hard time remembering what happened earlier in long sequences. The attention mechanism changed that by letting the model look at the entire input at once and weigh how important each part is relative to every other part.

This is what made the Transformer so powerful for NLP. Instead of hoping the model remembers something from 200 words ago, attention explicitly creates connections between all positions in the text. It’s the reason LLMs can handle long conversations and maintain context across thousands of Tokens.

Transformer - attention is the core piece of this architecture
LLMs - powered by attention
Tokens - what the attention mechanism operates over
Context Window - attention across the whole window is what makes it work

The AI Field

Explorer

Attention Mechanism

Attention Mechanism

Graph View

Table of Contents

Backlinks

The AI Field

Explorer

Attention Mechanism

Attention Mechanism

Related

Graph View

Table of Contents

Backlinks