AICurious Logo

What is: VirTex?

SourceVirTex: Learning Visual Representations from Textual Annotations
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

VirText, or Visual representations from Textual annotations is a pretraining approach using semantically dense captions to learn visual representations. First a ConvNet and Transformer are jointly trained from scratch to generate natural language captions for images. Then, the learned features are transferred to downstream visual recognition tasks.