What is: LV-ViT?
Source | All Tokens Matter: Token Labeling for Training Better Vision Transformers |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
LV-ViT is a type of vision transformer that uses token labelling as a training objective. Different from the standard training objective of ViTs that computes the classification loss on an additional trainable class token, token labelling takes advantage of all the image patch tokens to compute the training loss in a dense manner. Specifically, token labeling reformulates the image classification problem into multiple token-level recognition problems and assigns each patch token with an individual location-specific supervision generated by a machine annotator.