Viet-Anh on Software Logo

What is: Blender?

SourceBlendMask: Top-Down Meets Bottom-Up for Instance Segmentation
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Blender is a proposal-based instance mask generation module which incorporates rich instance-level information with accurate dense pixel features. A single convolution layer is added on top of the detection towers to produce attention masks along with each bounding box prediction. For each predicted instance, the blender crops predicted bases with its bounding box and linearly combines them according the learned attention maps.

The inputs of the blender module are bottom-level bases B\mathbf{B}, the selected top-level attentions AA and bounding box proposals PP. First RoIPool of Mask R-CNN to crop bases with each proposal p_d\mathbf{p}\_{d} and then resize the region to a fixed size R×RR \times R feature map r_d\mathbf{r}\_{d}

r_d=RoIPoolR×R(B,p_d),d{1D}\mathbf{r}\_{d}=\operatorname{RoIPool}_{R \times R}\left(\mathbf{B}, \mathbf{p}\_{d}\right), \quad \forall d \in\{1 \ldots D\}

More specifically, asampling ratio 1 is used for RoIAlign, i.e. one bin for each sampling point. During training, ground truth boxes are used as the proposals. During inference, FCOS prediction results are used.

The attention size MM is smaller than RR. We interpolate a_d\mathbf{a}\_{d} from M×MM \times M to R×RR \times R, into the shapes of R=\left\(\mathbf{r}\_{d} \mid d=1 \ldots D\right)

a_d= interpolate _M×MR×R(a_d),d{1D}\mathbf{a}\_{d}^{\prime}=\text { interpolate }\_{M \times M \rightarrow R \times R}\left(\mathbf{a}\_{d}\right), \quad \forall d \in\{1 \ldots D\}

Then a_d\mathbf{a}\_{d}^{\prime} is normalized with a softmax function along the KK dimension to make it a set of score maps s_d\mathbf{s}\_{d}.

s_d=softmax(a_d),d{1D}\mathbf{s}\_{d}=\operatorname{softmax}\left(\mathbf{a}\_{d}^{\prime}\right), \quad \forall d \in\{1 \ldots D\}

Then we apply element-wise product between each entity r_d,s_d\mathbf{r}\_{d}, \mathbf{s}\_{d} of the regions RR and scores SS, and sum along the KK dimension to get our mask logit m_d:\mathbf{m}\_{d}:

m_d=_k=1Ks_dkr_dk,d{1D}\mathbf{m}\_{d}=\sum\_{k=1}^{K} \mathbf{s}\_{d}^{k} \circ \mathbf{r}\_{d}^{k}, \quad \forall d \in\{1 \ldots D\}

where kk is the index of the basis. The mask blending process with K=4K=4 is visualized in the Figure.