What is: Span-Based Dynamic Convolution?
Source | ConvBERT: Improving BERT with Span-based Dynamic Convolution |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Span-Based Dynamic Convolution is a type of convolution used in the ConvBERT architecture to capture local dependencies between tokens. Kernels are generated by taking in a local span of current token, which better utilizes local dependency and discriminates different meanings of the same token (e.g., if “a” is in front of “can” in the input sentence, “can” is apparently a noun not a verb).
Specifically, with classic convolution, we would have fixed parameters shared for all input tokens. Dynamic convolution is therefore preferable because it has higher flexibility in capturing local dependencies of different tokens. Dynamic convolution uses a kernel generator to produce different kernels for different input tokens. However, such dynamic convolution cannot differentiate the same tokens within different context and generate the same kernels (e.g., the three “can” in Figure (b)).
Therefore the span-based dynamic convolution is developed to produce more adaptive convolution kernels by receiving an input span instead of only a single token, which enables discrimination of generated kernels for the same tokens within different context. For example, as shown in Figure (c), span-based dynamic convolution produces different kernels for different “can” tokens.