Viet-Anh on Software Logo

What is: XGPT?

SourceXGPT: Cross-modal Generative Pre-Training for Image Captioning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

XGPT is a method of cross-modal generative pre-training for image captioning designed to pre-train text-to-image caption generators through three novel generation tasks, including image-conditioned masked language modeling (IMLM), image-conditioned denoising autoencoding (IDA), and text-conditioned image feature generation (TIGF). The pre-trained XGPT can be fine-tuned without any task-specific architecture modifications and build strong image captioning models.