**Layer-wise Adaptive Rate Scaling**, or **LARS**, is a large batch optimization technique.  There are two notable differences between LARS and other adaptive algorithms such as [Adam](https://paperswithcode.com/method/adam) or [RMSProp](https://paperswithcode.com/method/rmsprop): first, LARS uses a separate learning rate for each layer and not for each weight. And second, the magnitude of the update is controlled with respect to the weight norm for better control of training speed.

$$m\_{t} = \beta\_{1}m\_{t-1} + \left(1-\beta\_{1}\right)\left(g\_{t} + \lambda{x\_{t}}\right)$$
$$x\_{t+1}^{\left(i\right)} = x\_{t}^{\left(i\right)}  - \eta\_{t}\frac{\phi\left(|| x\_{t}^{\left(i\right)} ||\right)}{|| m\_{t}^{\left(i\right)} || }m\_{t}^{\left(i\right)} $$

**UNETR**, or **UNet Transformer**, is a [Transformer](https://paperswithcode.com/methods/category/transformers)-based architecture for [medical image segmentation](https://paperswithcode.com/task/medical-image-segmentation) that utilizes a pure [transformer](https://paperswithcode.com/method/transformer) as the encoder to learn sequence representations of the input volume -- effectively capturing the global multi-scale information. The transformer encoder is directly connected to a decoder via [skip connections](https://paperswithcode.com/methods/category/skip-connections) at different resolutions like a [U-Net](https://paperswithcode.com/method/u-net) to compute the final semantic segmentation output.

UNETR

UNETR: Transformers for 3D Medical Image Segmentation

LARS

Large Batch Training of Convolutional Networks

**PASE+** is a problem-agnostic speech encoder that combines a convolutional encoder followed by multiple neural networks, called workers, tasked to solve self-supervised problems (i.e., ones that do not require manual annotations as ground truth). An online speech distortion module is employed, that contaminates the input signals with a variety of random disturbances. A revised encoder is also proposed that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks. Finally, the authors refine the set of workers used in self-supervision to encourage better cooperation.

Source	Large Batch Training of Convolutional Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com