RE: What’s the training objective for a BART transformer model?
The training objective for a BART (Bidirectional and Auto-Regressive Transformers) model is based on a sequence-to-sequence denoising autoencoding pre-training task. In simpler terms, it works in two steps:
1. Corruption: The model begins by randomly noising the input sequence. This can come in the form of text masking, deletion, permutation, or text infilling.
2. Restoration: After the corruption phase, the model uses the noisy versions as inputs and tries to recover the original, uncorrupted version.
The aim is for the model to capture both the left and right context from the input sequence to make accurate predictions. In the restoration stage, BART makes use of the standard transformer-based auto-regressive generation scheme, predicting each token only based on the previously generated ones.
Unlike BERT, which only considers the right context for prediction, BART can learn to associate words in both directions, improving its language understanding capabilities. Its transformer-based nature allows it to create an understanding of the sequence as a whole, giving it robustness in downstream tasks such as text generation, translation, summarization, and more.