RE: What’s the training objective for a BART transformer model?
The objective of training a BART (Bidirectional and Auto-Regressive Transformers) model is to achieve high-quality sequence-to-sequence pre-training. Originally proposed by Facebook in 2019, BART is a denoising autoencoder for pretraining sequence-to-sequence models.
During training, BART tries to reconstruct the original data (which is a sequence of tokens) after random noising is applied to it. Random noising could be any operation like token masking, token deletion, text infilling etc. This forces BART to learn richer, more diverse representations compared to models trained with only one noising setup.
Bidirectionality is another important feature of BART. While traditional transformers like GPT are unidirectional (i.e., they predict a word based on the previous words in the sentence), BART is bidirectional. It predicts words based on their context which includes both preceding and succeeding words. This results in better comprehension of the language syntax and semantics.
Another advantage of BART is that it can be fine-tuned for a variety of downstream tasks like question answering, text classification, summarization, or translation, among others. This is because the model has learned a rich understanding of sentence structure and language during its pre-training phase.
To summarize, BART's training objective is to optimize for understanding context and grammatical structure in a sentence, which in turn facilitates a rich array of downstream tasks.