AdaMax is a generalisation of Adam from the l_2 norm to the l_∞ norm. Define:
u_t=β∞_2v_t−1+(1−β∞_2)∣g_t∣∞
=max(β_2⋅v_t−1,∣g_t∣)
We can plug into the Adam update equation by replacing v^t+ϵ with u_t to obtain the AdaMax update rule:
θ_t+1=θ_t−u_tηm^_t
Common default values are η=0.002 and β_1=0.9 and β_2=0.999.