Review deck

Mask Requirements Review

Recall why Transformer training needs padding masks and sequence masks before attention probabilities are computed.

All decks
Mask Requirements Review reference figure
Review the motivation for masks until padding deviation and future-token leakage are easy to distinguish.
question
answer

Q1: In the Mask lesson, what is a mask in the general machine-learning sense?

maskdefinition

Q2: What are the two common mask operations used in self-attention models?

padding-maskcausal-mask

Q3: Why do variable-length sequences create a masking requirement inside a training batch?

paddingbatching

Q4: Why can a decoder cheat if it receives the entire target sentence without a sequence mask?

decoderinformation-leakage

Q5: How does the lesson summarize the difference between padding masks and sequence masks?

padding-masksequence-mask
Press any key