RE: What’s the training objective for a BART transformer model?
I was wondering this while reading the paper at
https://arxiv.org/abs/1810.04805
The training objective for a BART (Bidirectional and Auto-Regressive Transformers) model is to maximize the likelihood of a target sequence given a corrupted version of that sequence. This corrupted sequence is created by randomly masking out tokens (words, characters, etc.) from the original sequence. The model, then, needs to correctly predict the original sequence from the corrupted one. This objective makes BART useful for many downstream tasks, such as question answering, summarization, translation, etc., because it learns to understand the context and structure of input sequences.