**Self-Adjusting Smooth L1 Loss** is a loss function used in object detection that was introduced with [RetinaMask](https://paperswithcode.com/method/retinamask). This is an improved version of Smooth L1.  For Smooth L1 loss we have:

$$ f(x) = 0.5  \frac{x^{2}}{\beta} \text{ if } |x| < \beta $$
$$ f(x) = |x| -0.5\beta \text{ otherwise } $$

Here a point $\beta$ splits the positive axis range into two parts: $L2$ loss is used for targets in range $[0, \beta]$, and $L1$ loss is used beyond $\beta$ to avoid over-penalizing  utliers. The overall function is smooth (continuous, together with its derivative). However, the choice of control point ($\beta$) is heuristic and is usually done by hyper parameter search.

Instead, with self-adjusting smooth L1 loss, inside the loss function the running mean and variance of the absolute loss are recorded. We use the running minibatch mean and variance with a momentum of $0.9$ to update these two parameters.

**LSGAN**, or **Least Squares GAN**, is a type of generative adversarial network that adopts the least squares loss function for the discriminator. Minimizing the objective function of LSGAN yields minimizing the Pearson $\chi^{2}$ divergence. The objective function can be defined as:

$$ \min\_{D}V\_{LSGAN}\left(D\right) = \frac{1}{2}\mathbb{E}\_{\mathbf{x} \sim p\_{data}\left(\mathbf{x}\right)}\left[\left(D\left(\mathbf{x}\right) - b\right)^{2}\right] + \frac{1}{2}\mathbb{E}\_{\mathbf{z}\sim p\_{\mathbf{z}}\left(\mathbf{z}\right)}\left[\left(D\left(G\left(\mathbf{z}\right)\right) - a\right)^{2}\right] $$

$$ \min\_{G}V\_{LSGAN}\left(G\right) = \frac{1}{2}\mathbb{E}\_{\mathbf{z} \sim p\_{\mathbf{z}}\left(\mathbf{z}\right)}\left[\left(D\left(G\left(\mathbf{z}\right)\right) - c\right)^{2}\right] $$

where $a$ and $b$ are the labels for fake data and real data and $c$ denotes the value that $G$ wants $D$ to believe for fake data.

LSGAN

Least Squares Generative Adversarial Networks

Self-Adjusting Smooth L1 Loss

RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

**Detr**, or **Detection Transformer**, is a set-based object detector using a [Transformer](https://paperswithcode.com/method/transformer) on top of a convolutional backbone. It uses a conventional CNN backbone to learn a 2D representation of an input image. The model flattens it and supplements it with a positional encoding before passing it into a transformer encoder. A transformer decoder then takes as input a small fixed number of learned positional embeddings, which we call object queries, and additionally attends to the encoder output. We pass each output embedding of the decoder to a shared feed forward network (FFN) that predicts either a detection (class
and bounding box) or a “no object” class.

Source	RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com