Review deck

Padding Mask And Softmax Review

Recall how padding masks remove artificial tokens from the attention softmax and weighted value sum.

All decks
Padding Mask And Softmax Review reference figure
Review the padding-mask path from pad token to masked score to zero softmax probability.
question
answer

Q1: Why can zero-valued padding tokens still distort softmax attention?

softmaxpadding-mask

Q2: What value is commonly written into the mask at filler-word positions before softmax?

negative-infinitysoftmax

Q3: What are the four high-level steps for applying a padding mask in attention?

attentionimplementation

Q4: After a padding mask has worked correctly, what happens to padded value vectors in the weighted sum?

value-vectorsweighted-sum

Q5: Why does the lesson also care about padded positions during loss and backpropagation?

lossbackpropagation
Press any key