**Pyramid Vision Transformer v2** (PVTv2) is a type of [Vision Transformer](https://paperswithcode.com/method/vision-transformer) for detection and segmentation tasks. It improves on [PVTv1](https://paperswithcode.com/method/pvt) through several design improvements: (1) overlapping patch embedding, (2) convolutional feed-forward networks, and (3) linear complexity attention layers that are orthogonal to the PVTv1 framework.

**CurricularFace**, or **Adaptive Curriculum Learning**, is a method for face recognition that embeds the idea of curriculum learning into the loss function to achieve a new training scheme. This training scheme mainly addresses easy samples in the early training stage and hard ones in the later stage. Specifically, CurricularFace adaptively adjusts the relative importance of easy and hard samples during different training stages.

CurricularFace

CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition

PVTv2

PVT v2: Improved Baselines with Pyramid Vision Transformer

VLMo is a unified vision-language pre-trained model that jointly learns a dual encoder and a fusion encoder with a modular Transformer network. A Mixture-of-Modality-Experts (MOME) transformer is introduced to encode different modalities which helps it to capture modality-specific information by modality experts, and align content of different modalities by the self-attention module shared across modalities. The model parameters are shared across image-text contrastive learning, masked language modeling, and image-text matching tasks. During fine-tuning, the flexible modeling allows for VLMO to be used as either a dual encoder (i.e., separately encode images and text for retrieval tasks) or a fusion encoder (i.e., jointly encode image-text pairs for better interaction across modalities) Stage-wise pretraining on image-only and text-only data improved the vision-language pre-trained model. The model can be used for classification tasks and fine-tuned as a dual encoder for retrieval tasks.

Source	PVT v2: Improved Baselines with Pyramid Vision Transformer
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Pyramid Vision Transformer v2?

Viet-Anh on Software