RE: Why do I need to add positional embeddings to my transformer inputs?

Positional embeddings are indeed crucial for transformers. The reason for this is that transformers, unlike RNNs and CNNs, do not implicitly understand the order or position of the input data. The transformer architecture uses self-attention mechanisms that give equal weight to all parts of the input sequence, which makes it "order-agnostic". This property is useful for certain tasks, but for many others (like language understanding), the position or order of elements in the input sequence carries crucial information. For instance, in a sentence, words get their meaning mainly from their surrounding words; the positional context is crucial to understand the sentence. That's why we add positional embeddings to impose some sort of order on the input. They provide the transformer with knowledge about the relative or absolute position of tokens in the sequence and thus help it make better decisions. In essence, even though transformers are powerful, they do not inherently understand the concept of "position" in a sequence and require positional embeddings for tasks where the order of data matters.

Your Answer

HOT QUESTIONS