What is: XGPT?
Source | XGPT: Cross-modal Generative Pre-Training for Image Captioning |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
XGPT is a method of cross-modal generative pre-training for image captioning designed to pre-train text-to-image caption generators through three novel generation tasks, including image-conditioned masked language modeling (IMLM), image-conditioned denoising autoencoding (IDA), and text-conditioned image feature generation (TIGF). The pre-trained XGPT can be fine-tuned without any task-specific architecture modifications and build strong image captioning models.