What’s the training objective for a BART transformer model?
The training objective for a BART (Bidirectional and Auto-Regressive Transformers) model is based on a sequence-to-sequence denoising autoencoding pre-training task. In simpler terms, it works in two steps:
1. Corruption: The model begins by randomly noising the input sequence. This can come in the form of text masking, deletion, permutation, or text infilling.
2. Restoration: After the corruption phase, the model uses the noisy versions as inputs and tries to recover the original, uncorrupted version.
The aim is for the model to capture both the left and right context from the input sequence to make accurate predictions. In the restoration stage, BART makes use of the standard transformer-based auto-regressive generation scheme, predicting each token only based on the previously generated ones.
Unlike BERT, which only considers the right context for prediction, BART can learn to associate words in both directions, improving its language understanding capabilities. Its transformer-based nature allows it to create an understanding of the sequence as a whole, giving it robustness in downstream tasks such as text generation, translation, summarization, and more.
The objective of training a BART (Bidirectional and Auto-Regressive Transformers) model is to achieve high-quality sequence-to-sequence pre-training. Originally proposed by Facebook in 2019, BART is a denoising autoencoder for pretraining sequence-to-sequence models.
During training, BART tries to reconstruct the original data (which is a sequence of tokens) after random noising is applied to it. Random noising could be any operation like token masking, token deletion, text infilling etc. This forces BART to learn richer, more diverse representations compared to models trained with only one noising setup.
Bidirectionality is another important feature of BART. While traditional transformers like GPT are unidirectional (i.e., they predict a word based on the previous words in the sentence), BART is bidirectional. It predicts words based on their context which includes both preceding and succeeding words. This results in better comprehension of the language syntax and semantics.
Another advantage of BART is that it can be fine-tuned for a variety of downstream tasks like question answering, text classification, summarization, or translation, among others. This is because the model has learned a rich understanding of sentence structure and language during its pre-training phase.
To summarize, BART's training objective is to optimize for understanding context and grammatical structure in a sentence, which in turn facilitates a rich array of downstream tasks.