Exploring the Transformer Series (11) --- Mask
Transformer masks: padding mask, sequence/causal mask, implementation details, data flow, and advanced sample-packing strategies.
Transformer masks: padding mask, sequence/causal mask, implementation details, data flow, and advanced sample-packing strategies.
Self-attention in Transformers: principles, implementation details, scaling/softmax analysis, and modern optimization directions.