What is: Additive Angular Margin Loss?

ArcFace, or Additive Angular Margin Loss, is a loss function used in face recognition tasks. The softmax is traditionally used in these tasks. However, the softmax loss function does not explicitly optimise the feature embedding to enforce higher similarity for intraclass samples and diversity for inter-class samples, which results in a performance gap for deep face recognition under large intra-class appearance variations.

The ArcFace loss transforms the logits $W^{T}\_{j}x\_{i} = || W\_{j} || \text{ } || x\_{i} || \cos\theta\_{j}$ , where $\theta\_{j}$ is the angle between the weight $W\_{j}$ and the feature $x\_{i}$ . The individual weight $|| W\_{j} || = 1$ is fixed by $l\_{2}$ normalization. The embedding feature $||x\_{i} ||$ is fixed by $l\_{2}$ normalization and re-scaled to $s$ . The normalisation step on features and weights makes the predictions only depend on the angle between the feature and the weight. The learned embedding features are thus distributed on a hypersphere with a radius of $s$ . Finally, an additive angular margin penalty $m$ is added between $x\_{i}$ and $W\_{y\_{i}}$ to simultaneously enhance the intra-class compactness and inter-class discrepancy. Since the proposed additive angular margin penalty is equal to the geodesic distance margin penalty in the normalised hypersphere, the method is named ArcFace:

$L\_{3} = -\frac{1}{N}\sum^{N}\_{i=1}\log\frac{e^{s\left(\cos\left(\theta\_{y\_{i}} + m\right)\right)}}{e^{s\left(\cos\left(\theta\_{y\_{i}} + m\right)\right)} + \sum^{n}\_{j=1, j \neq y\_{i}}e^{s\cos\theta\_{j}}}$

The authors select face images from 8 different identities containing enough samples (around 1,500 images/class) to train 2-D feature embedding networks with the softmax and ArcFace loss, respectively. As the Figure shows, the softmax loss provides roughly separable feature embedding but produces noticeable ambiguity in decision boundaries, while the proposed ArcFace loss can obviously enforce a more evident gap between the nearest classes.

Other alternatives to enforce intra-class compactness and inter-class distance include Supervised Contrastive Learning.

Source	ArcFace: Additive Angular Margin Loss for Deep Face Recognition
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com