Viet-Anh on Software Logo

What is: Swish?

SourceSearching for Activation Functions
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Swish is an activation function, f(x)=xsigmoid(βx)f(x) = x \cdot \text{sigmoid}(\beta x), where β\beta a learnable parameter. Nearly all implementations do not use the learnable parameter β\beta, in which case the activation function is xσ(x)x\sigma(x) ("Swish-1").

The function xσ(x)x\sigma(x) is exactly the SiLU, which was introduced by other authors before the swish. See Gaussian Error Linear Units (GELUs) where the SiLU (Sigmoid Linear Unit) was originally coined, and see Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning and Swish: a Self-Gated Activation Function where the same activation function was experimented with later.