Viet-Anh on Software Logo

What is: Huber loss?

Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The Huber loss function describes the penalty incurred by an estimation procedure f. Huber (1964) defines the loss function piecewise by[1]

L δ ( a ) = { 1 2 a 2 for  | a | ≤ δ , δ ⋅ ( | a | − 1 2 δ ) , otherwise. {\displaystyle L_{\delta }(a)={\begin{cases}{\frac {1}{2}}{a^{2}}&{\text{for }}|a|\leq \delta ,\\\delta \cdot \left(|a|-{\frac {1}{2}}\delta \right),&{\text{otherwise.}}\end{cases}}}

This function is quadratic for small values of a, and linear for large values, with equal values and slopes of the different sections at the two points where | a | = δ |a|=\delta . The variable a often refers to the residuals, that is to the difference between the observed and predicted values a = y − f ( x ) a=y-f(x), so the former can be expanded to[2]

L δ ( y , f ( x ) ) = { 1 2 ( y − f ( x ) ) 2 for  | y − f ( x ) | ≤ δ , δ   ⋅ ( | y − f ( x ) | − 1 2 δ ) , otherwise. {\displaystyle L_{\delta }(y,f(x))={\begin{cases}{\frac {1}{2}}(y-f(x))^{2}&{\text{for }}|y-f(x)|\leq \delta ,\\\delta \ \cdot \left(|y-f(x)|-{\frac {1}{2}}\delta \right),&{\text{otherwise.}}\end{cases}}}

The Huber loss is the convolution of the absolute value function with the rectangular function, scaled and translated. Thus it "smoothens out" the former's corner at the origin.

.. math:: \ell(x, y) = L = {l_1, ..., l_N}^T

with

.. math::
    l_n = \begin{cases}
    0.5 (x_n - y_n)^2, & \text{if } |x_n - y_n| < delta \\
    delta * (|x_n - y_n| - 0.5 * delta), & \text{otherwise }
    \end{cases}