BERT与Transformer

关于

被BERT和Transformer刷屏了,研究一下。

关键论文:

Transformer要点

attention

$$
Attention(Q, K, V) = softmax(\frac{Q K^T}{\sqrt{d_k}}) V
$$

attention2

$$
MultiHead(Q, K, V) = Concat(head_1, ..., head_h)W^O \\
head_i = Attention(QW_i^Q , KW_i^K , VW_i^V )
$$

multi head attention

$$
FFN(x) = \max(0, xW_1 + b_1) W_2 + b_2
$$

Transfomer

Transform用作语言模型

OpenAI

BERT论文要点

Input: the man went to the [MASK1] . he bought a [MASK2] of milk.
Labels: [MASK1] = store; [MASK2] = gallon

相关关键论文