Why do I need to add positional embeddings to my transformer inputs?

Why do I need to add positional embeddings to my transformer inputs? I thought the transformer was so smart, how does it not know about input positions a priori?
Add Comment
2 Answer(s)
Positional embeddings are indeed crucial for transformers. The reason for this is that transformers, unlike RNNs and CNNs, do not implicitly understand the order or position of the input data. The transformer architecture uses self-attention mechanisms that give equal weight to all parts of the input sequence, which makes it "order-agnostic". This property is useful for certain tasks, but for many others (like language understanding), the position or order of elements in the input sequence carries crucial information. For instance, in a sentence, words get their meaning mainly from their surrounding words; the positional context is crucial to understand the sentence. That's why we add positional embeddings to impose some sort of order on the input. They provide the transformer with knowledge about the relative or absolute position of tokens in the sequence and thus help it make better decisions. In essence, even though transformers are powerful, they do not inherently understand the concept of "position" in a sequence and require positional embeddings for tasks where the order of data matters.
Answered on August 5, 2023.
Add Comment
The Transformer model, by design, does not take into consideration the position or order of inputs. This is due to its self-attention mechanism, which considers all inputs as a 'set', ignoring their sequence. The positional embeddings are added to give the model a sense of order or sequence of the inputs. They provide additional information regarding the position of each word in a sequence. Without positional embeddings, a Transformer model cannot differentiate the order of words, which is crucial in many tasks like natural language understanding. So in essence, positional embeddings are not a limitation of the Transformer model, but rather a deliberate design choice for handling sequential inputs in tasks where order is important.
Answered on August 5, 2023.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.