The **Swin Transformer** is a type of [Vision Transformer](https://paperswithcode.com/method/vision-transformer). It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. In contrast, previous vision Transformers produce feature maps of a single low resolution and have quadratic computation complexity to input image size due to computation of self-attention globally.

**Quasi-Hyperbolic Momentum (QHM)** is a stochastic optimization technique that alters [momentum SGD](https://paperswithcode.com/method/sgd-with-momentum) with a momentum step, averaging an [SGD](https://paperswithcode.com/method/sgd) step with a momentum step:

$$ g\_{t+1} = \beta{g\_{t}} + \left(1-\beta\right)\cdot{\nabla}\hat{L}\_{t}\left(\theta\_{t}\right) $$
$$ \theta\_{t+1} = \theta\_{t} - \alpha\left[\left(1-v\right)\cdot\nabla\hat{L}\_{t}\left(\theta\_{t}\right) + v\cdot{g\_{t+1}}\right]$$

The authors suggest a rule of thumb of $v = 0.7$ and $\beta = 0.999$.

Quasi-hyperbolic momentum and Adam for deep learning

Swin Transformer

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

**CenterNet** is a one-stage object detector that detects each object as a triplet, rather than a pair, of keypoints. It utilizes two customized modules named [cascade corner pooling](https://paperswithcode.com/method/cascade-corner-pooling) and [center pooling](https://paperswithcode.com/method/center-pooling), which play the roles of enriching information collected by both top-left and bottom-right corners and providing more recognizable information at the central regions, respectively. The intuition is that, if a predicted bounding box has a high IoU with the ground-truth box, then the probability that the center keypoint in its central region is predicted as the same class is high, and vice versa. Thus, during inference, after a proposal is generated as a pair of corner keypoints, we determine if the proposal is indeed an object by checking if there is a center keypoint of the same class falling within its central region.

Source	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Swin Transformer?

Viet-Anh on Software