Review deck

Mask Debugger

Build and verify padding, causal, decoder, packed-document, and masked-softmax behavior for attention masks.

All decks
Mask Debugger reference figure
Review the mask rules until padding, future-token, and packed-document leakage are easy to recognize.
question
answer

Q1: What does a key-padding mask prevent in attention?

padding-maskattention

Q2: In decoder self-attention, which positions may token i attend to?

causal-maskdecoder

Q3: Why do decoder masks usually combine target padding and causal masking?

decodermask-merge

Q4: What extra rule is needed when multiple documents are packed into one sequence?

sample-packingattention

Q5: Where should an attention mask be applied to make blocked probabilities zero?

softmaxattention
Press any key