Exploring the Transformer Series (14) --- Residual Networks and Normalization
Residual connections and normalization in Transformers: ResNet intuition, BatchNorm vs LayerNorm, Pre-Norm vs Post-Norm, implementations, and recent variants.
Residual connections and normalization in Transformers: ResNet intuition, BatchNorm vs LayerNorm, Pre-Norm vs Post-Norm, implementations, and recent variants.