**Conditional Positional Encoding**, or **CPE**, is a type of positional encoding for [vision transformers](https://paperswithcode.com/methods/category/vision-transformer). Unlike previous fixed or learnable positional encodings, which are predefined and independent of input tokens, CPE is dynamically generated and conditioned on the local neighborhood of the input tokens. As a result, CPE aims to generalize to the input sequences that are longer than what the model has ever seen during training. CPE can also keep the desired translation-invariance in the image classification task. CPE can be implemented with a [Position
Encoding Generator](https://paperswithcode.com/method/positional-encoding-generator) (PEG) and incorporated into the current [Transformer framework](https://paperswithcode.com/methods/category/transformers).

**MODNet** is a light-weight matting objective decomposition network that can process portrait matting from a single input image in real time. The design of MODNet benefits from optimizing a series of correlated sub-objectives simultaneously via explicit constraints. To overcome the domain shift problem, MODNet introduces a self-supervised strategy based on subobjective consistency (SOC) and  a one-frame delay trick to smooth the results when applying MODNet to portrait video sequence.

Given an input image $I$, MODNet predicts human semantics $s\_{p}$, boundary details $d\_{p}$, and final alpha matte $\alpha\_{p}$ through three interdependent branches, $S, D$, and $F$, which are constrained by specific supervisions generated from the ground truth matte $\alpha\_{g}$. Since the decomposed sub-objectives are correlated and help strengthen each other, we can optimize MODNet end-to-end.

MODNet

MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition

Conditional Positional Encoding

Conditional Positional Encodings for Vision Transformers

**DNN2LR** is an automatic feature crossing method to find feature interactions in a deep neural network, and use them as cross features in logistic regression. In general, DNN2LR consists of two steps: (1) generating a compact and accurate candidate set of cross feature fields; (2) searching in the candidate set for the final cross feature fields.

Source	Conditional Positional Encodings for Vision Transformers
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com