AICurious Logo

What is: Additive Angular Margin Loss?

SourceArcFace: Additive Angular Margin Loss for Deep Face Recognition
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

ArcFace, or Additive Angular Margin Loss, is a loss function used in face recognition tasks. The softmax is traditionally used in these tasks. However, the softmax loss function does not explicitly optimise the feature embedding to enforce higher similarity for intraclass samples and diversity for inter-class samples, which results in a performance gap for deep face recognition under large intra-class appearance variations.

The ArcFace loss transforms the logits WT_jx_i=W_j x_icosθ_jW^{T}\_{j}x\_{i} = || W\_{j} || \text{ } || x\_{i} || \cos\theta\_{j}, where θ_j\theta\_{j} is the angle between the weight W_jW\_{j} and the feature x_ix\_{i}. The individual weight W_j=1 || W\_{j} || = 1 is fixed by l_2l\_{2} normalization. The embedding feature x_i ||x\_{i} || is fixed by l_2l\_{2} normalization and re-scaled to ss. The normalisation step on features and weights makes the predictions only depend on the angle between the feature and the weight. The learned embedding features are thus distributed on a hypersphere with a radius of ss. Finally, an additive angular margin penalty mm is added between x_ix\_{i} and W_y_iW\_{y\_{i}} to simultaneously enhance the intra-class compactness and inter-class discrepancy. Since the proposed additive angular margin penalty is equal to the geodesic distance margin penalty in the normalised hypersphere, the method is named ArcFace:

L_3=1NN_i=1loges(cos(θ_y_i+m))es(cos(θ_y_i+m))+n_j=1,jy_iescosθ_jL\_{3} = -\frac{1}{N}\sum^{N}\_{i=1}\log\frac{e^{s\left(\cos\left(\theta\_{y\_{i}} + m\right)\right)}}{e^{s\left(\cos\left(\theta\_{y\_{i}} + m\right)\right)} + \sum^{n}\_{j=1, j \neq y\_{i}}e^{s\cos\theta\_{j}}}

The authors select face images from 8 different identities containing enough samples (around 1,500 images/class) to train 2-D feature embedding networks with the softmax and ArcFace loss, respectively. As the Figure shows, the softmax loss provides roughly separable feature embedding but produces noticeable ambiguity in decision boundaries, while the proposed ArcFace loss can obviously enforce a more evident gap between the nearest classes.

Other alternatives to enforce intra-class compactness and inter-class distance include Supervised Contrastive Learning.