Exploring the Transformer Series (10) --- Self-Attention
Self-attention in Transformers: principles, implementation details, scaling/softmax analysis, and modern optimization directions.
Self-attention in Transformers: principles, implementation details, scaling/softmax analysis, and modern optimization directions.