**Demon Adam** is a stochastic optimizer where the [Demon](https://paperswithcode.com/method/demon) momentum rule is applied to the [Adam](https://paperswithcode.com/method/adam) optimizer.

$$ \beta\_{t} = \beta\_{init}\cdot\frac{\left(1-\frac{t}{T}\right)}{\left(1-\beta\_{init}\right) + \beta\_{init}\left(1-\frac{t}{T}\right)} $$

$$ m\_{t, i} = g\_{t, i} + \beta\_{t}m\_{t-1, i} $$

$$ v\_{t+1} = \beta\_{2}v\_{t}  + \left(1-\beta\_{2}\right)g^{2}\_{t} $$

$$ \theta_{t} = \theta_{t-1} - \eta\frac{\hat{m}\_{t}}{\sqrt{\hat{v}\_{t}} + \epsilon}  $$

A **Depthwise Dilated Separable Convolution** is a type of [convolution](https://paperswithcode.com/method/convolution) that combines [depthwise separability](https://paperswithcode.com/method/depthwise-separable-convolution) with the use of [dilated convolutions](https://paperswithcode.com/method/dilated-convolution).

Depthwise Dilated Separable Convolution

ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network

Demon ADAM

Demon: Improved Neural Network Training with Momentum Decay

**Margin Rectified Linear Unit**, or **Margin ReLU**, is a type of activation function based on a [ReLU](https://paperswithcode.com/method/relu), but it has a negative threshold for negative values instead of a zero threshhold.

Source	Demon: Improved Neural Network Training with Momentum Decay
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com