What is: Split Attention?
Source | ResNeSt: Split-Attention Networks |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
A Split Attention block enables attention across feature-map groups. As in ResNeXt blocks, the feature can be divided into several groups, and the number of feature-map groups is given by a cardinality hyperparameter . The resulting feature-map groups are called cardinal groups. Split Attention blocks introduce a new radix hyperparameter that indicates the number of splits within a cardinal group, so the total number of feature groups is . We may apply a series of transformations {} to each individual group, then the intermediate representation of each group is , for {}.
A combined representation for each cardinal group can be obtained by fusing via an element-wise summation across multiple splits. The representation for -th cardinal group is , where for , and , and are the block output feature-map sizes. Global contextual information with embedded channel-wise statistics can be gathered with global average pooling across spatial dimensions . Here the -th component is calculated as:
A weighted fusion of the cardinal group representation is aggregated using channel-wise soft attention, where each feature-map channel is produced using a weighted combination over splits. The -th channel is calculated as:
where denotes a (soft) assignment weight given by:
and mapping determines the weight of each split for the -th channel based on the global context representation .