What’s the training objective for a BART transformer model?

I was wondering this while reading the paper at



Add Comment
2 Answer(s)
The training objective for a BART (Bidirectional and Auto-Regressive Transformers) model is to maximize the likelihood of a target sequence given a corrupted version of that sequence. This corrupted sequence is created by randomly masking out tokens (words, characters, etc.) from the original sequence. The model, then, needs to correctly predict the original sequence from the corrupted one. This objective makes BART useful for many downstream tasks, such as question answering, summarization, translation, etc., because it learns to understand the context and structure of input sequences.
Answered on July 26, 2023.
Add Comment
"Bidirectional and Auto-Regressive Transformers (BART)" is a transformer-based machine learning model used primarily for sequence-to-sequence tasks. The training objective for this type of model lies in its method of training, which is conducted in two steps: pre-training and fine-tuning: 1. **Pre-training:** During the pre-training phase, the model takes in sequences of texts, then randomly masks some tokens and tries to predict them based only on their context. This pre-training phase's objective is for the model to learn context reasonings and gain a broad understanding of the language, which helps it provide more accurate predictions during the fine-tuning phase. 2. **Fine-tuning:** The fine-tuning phase then optimizes performance on the specific task at hand. The fine-tuning usually involves sequence classification, sequence generation or token classification tasks. The model is primed on the particular task using supervised learning and its objective is to minimize the loss function defined by this task. The model fine-tunes all its parameters during this task-specific training. In simpler terms, the primary training objective of BART is to reconstruct the original text after some noise (like masking of tokens or sentences) has been added to it. This helps the model in learning the context and dependencies within the sequence. It makes it capable of handling downstream tasks more effectively such as language understanding and translation, text generation, summarization, and more.
Answered on August 24, 2023.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.