RE: Why do I need to add positional embeddings to my transformer inputs?

Why do I need to add positional embeddings to my transformer inputs? I thought the transformer was so smart, how does it not know about input positions a priori?
Add Comment
2 Answers
Positional embeddings are indeed crucial for transformers. The reason for this is that transformers, unlike RNNs and CNNs, do not implicitly understand the order or position of the input data. The transformer architecture uses self-attention mechanisms that give equal weight to all parts of the input sequence, which makes it "order-agnostic". This property is useful for certain tasks, but for many others (like language understanding), the position or order of elements in the input sequence carries crucial information. For instance, in a sentence, words get their meaning mainly from their surrounding words; the positional context is crucial to understand the sentence. That's why we add positional embeddings to impose some sort of order on the input. They provide the transformer with knowledge about the relative or absolute position of tokens in the sequence and thus help it make better decisions. In essence, even though transformers are powerful, they do not inherently understand the concept of "position" in a sequence and require positional embeddings for tasks where the order of data matters.
Answered on August 5, 2023.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.