Q1: Which mask does the encoder use in the Harvard EncoderDecoder flow?
Review deck
Mask Data Stream And PyTorch Review
Recall where src_mask and tgt_mask flow through encoder, decoder self-attention, decoder cross-attention, and PyTorch APIs.
question
answer
Q2: Which mask is used by decoder self-attention?
Q3: Which mask is used by decoder cross-attention in the Harvard flow?
Q4: How do src_mask and tgt_mask differ in shape in the lesson's data-stream explanation?
Q5: How does PyTorch distinguish attention masks from key-padding masks?