Viet-Anh on Software Logo

What is: DV3 Convolution Block?

SourceDeep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

DV3 Convolution Block is a convolutional block used for the Deep Voice 3 text-to-speech architecture. It consists of a 1-D convolution with a gated linear unit and a residual connection. In the Figure, cc denotes the dimensionality of the input. The convolution output of size 2c2 \cdot c is split into equal-sized portions: the gate vector and the input vector. A scaling factor 0.5\sqrt{0.5} is used to ensure that we preserve the input variance early in training. The gated linear unit provides a linear path for the gradient flow, which alleviates the vanishing gradient issue for stacked convolution blocks while retaining non-linearity. To introduce speaker-dependent control, a speaker-dependent embedding is added as a bias to the convolution filter output, after a softsign function. The authors use the softsign nonlinearity because it limits the range of the output while also avoiding the saturation problem that exponential based nonlinearities sometimes exhibit. Convolution filter weights are initialized with zero-mean and unit-variance activations throughout the entire network.