**Residual Networks**, or **ResNets**, learn residual functions with reference to the layer inputs, instead of learning unreferenced functions. Instead of hoping each few stacked layers directly fit a desired underlying mapping, residual nets let these layers fit a residual mapping. They stack [residual blocks](https://paperswithcode.com/method/residual-block) ontop of each other to form network: e.g. a ResNet-50 has fifty layers using these blocks. 

Formally, denoting the desired underlying mapping as $\mathcal{H}(x)$, we let the stacked nonlinear layers fit another mapping of $\mathcal{F}(x):=\mathcal{H}(x)-x$. The original mapping is recast into $\mathcal{F}(x)+x$.

There is empirical evidence that these types of network are easier to optimize, and can gain accuracy from considerably increased depth.

**Inception-ResNet-v2 Reduction-B** is an image model block used in the [Inception-ResNet-v2](https://paperswithcode.com/method/inception-resnet-v2) architecture.

Inception-ResNet-v2 Reduction-B

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

ResNet

Deep Residual Learning for Image Recognition

**RMSProp** is an unpublished adaptive learning rate optimizer [proposed by Geoff Hinton](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf). The motivation is that the magnitude of gradients can differ for different weights, and can change during learning, making it hard to choose a single global learning rate. RMSProp tackles this by keeping a moving average of the squared gradient and adjusting the weight updates by this magnitude. The gradient updates are performed as:

$$E\left[g^{2}\right]\_{t} = \gamma E\left[g^{2}\right]\_{t-1} + \left(1 - \gamma\right) g^{2}\_{t}$$

$$\theta\_{t+1} = \theta\_{t} - \frac{\eta}{\sqrt{E\left[g^{2}\right]\_{t} + \epsilon}}g\_{t}$$

Hinton suggests $\gamma=0.9$, with a good default for $\eta$ as $0.001$.

Image: [Alec Radford](https://twitter.com/alecrad)

Source	Deep Residual Learning for Image Recognition
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com