What is: Segmentation Transformer?

Segmentation Transformer, or SETR, is a Transformer-based segmentation model. The transformer-alone encoder treats an input image as a sequence of image patches represented by learned patch embedding, and transforms the sequence with global self-attention modeling for discriminative feature representation learning. Concretely, we first decompose an image into a grid of fixed-sized patches, forming a sequence of patches. With a linear embedding layer applied to the flattened pixel vectors of every patch, we then obtain a sequence of feature embedding vectors as the input to a transformer. Given the learned features from the encoder transformer, a decoder is then used to recover the original image resolution. Crucially there is no downsampling in spatial resolution but global context modeling at every layer of the encoder transformer.

Source	Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com

Viet-Anh on Software

What is: Segmentation Transformer?

Viet-Anh on Software