What is: VirTex?
Source | VirTex: Learning Visual Representations from Textual Annotations |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
VirText, or Visual representations from Textual annotations is a pretraining approach using semantically dense captions to learn visual representations. First a ConvNet and Transformer are jointly trained from scratch to generate natural language captions for images. Then, the learned features are transferred to downstream visual recognition tasks.