What is: Blender?
Source | BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Blender is a proposal-based instance mask generation module which incorporates rich instance-level information with accurate dense pixel features. A single convolution layer is added on top of the detection towers to produce attention masks along with each bounding box prediction. For each predicted instance, the blender crops predicted bases with its bounding box and linearly combines them according the learned attention maps.
The inputs of the blender module are bottom-level bases , the selected top-level attentions and bounding box proposals . First RoIPool of Mask R-CNN to crop bases with each proposal and then resize the region to a fixed size feature map
More specifically, asampling ratio 1 is used for RoIAlign, i.e. one bin for each sampling point. During training, ground truth boxes are used as the proposals. During inference, FCOS prediction results are used.
The attention size is smaller than . We interpolate from to , into the shapes of R=\left\(\mathbf{r}\_{d} \mid d=1 \ldots D\right)
Then is normalized with a softmax function along the dimension to make it a set of score maps .
Then we apply element-wise product between each entity of the regions and scores , and sum along the dimension to get our mask logit
where is the index of the basis. The mask blending process with is visualized in the Figure.