Slowly Explaining the “Transformer” Dominating the AI Field (Day 2) Introduction/Background

Understanding the Models Behind GPT-3, BERT, and T5: An Explanation of Transformers | AI News Media AINOW

  • Easy to understand
  • Self Attention
    • Inferring attention to important surrounding words for understanding a given word

Thorough Explanation of Transformer, Essential Knowledge for Natural Language Processing | DeepSquare

  • Things I don’t understand

The Transformer is based on the encoder-decoder model and uses self-attention layers and position-wise fully connected layers, which are its main features. In other words, if you understand the following three (+ two) things, you can understand the model structure, so I will explain them in order:

  • Encoder-Decoder Model
  • Attention
  • Position-wise fully connected layers
  • Character embedding and softmax
  • Positional encoding