Review deck

Advanced Masking And Sample Packing Review

Recall sample packing masks, block diagonal attention, packing strategies, and rank-collapse implications from the advanced Mask section.

All decks
Advanced Masking And Sample Packing Review reference figure
Review advanced masking until packed documents, block diagonal masks, and rank-collapse tradeoffs are easy to explain.
question
answer

Q1: Why does long-context training make naive batch padding especially wasteful?

long-contextpadding

Q2: What is sample packing in the Mask lesson?

sample-packingattention-mask

Q3: Why does packed training need a block diagonal attention mask?

block-diagonal-maskdocument-boundaries

Q4: What tradeoff does the lesson associate with packing strategies such as FixedLengthPacking, MultiPack, and SortedPacking?

packing-strategyload-balance

Q5: What role can attention masks play in rank-collapse behavior according to the advanced section?

rank-collapselocal-attention
Press any key