Viet-Anh on Software Logo

What is: GridMask?

SourceGridMask Data Augmentation
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

GridMask is a data augmentation method that randomly removes some pixels of an input image. Unlike other methods, the region that the algorithm removes is neither a continuous region nor random pixels in dropout. Instead, the algorithm removes a region with disconnected pixel sets, as shown in the Figure.

We express the setting as

x~=x×M\tilde{\mathbf{x}}=\mathbf{x} \times M

where xRH×W×C\mathbf{x} \in R^{H \times W \times C} represents the input image, MM \in {0,1}H×W\{0,1\}^{H \times W} is the binary mask that stores pixels to be removed, and x~RH×W×C\tilde{\mathbf{x}} \in R^{H \times W \times C} is the result produced by the algorithm. For the binary mask MM, if Mi,j=1M_{i, j}=1 we keep pixel (i,j)(i, j) in the input image; otherwise we remove it. GridMask is applied after the image normalization operation.

The shape of MM looks like a grid, as shown in the Figure . Four numbers (r,d,δx,δy)\left(r, d, \delta_{x}, \delta_{y}\right) are used to represent a unique MM. Every mask is formed by tiling the units. rr is the ratio of the shorter gray edge in a unit. dd is the length of one unit. δ_x\delta\_{x} and δ_y\delta\_{y} are the distances between the first intact unit and boundary of the image.