The **Swin Transformer** is a type of [Vision Transformer](https://paperswithcode.com/method/vision-transformer). It builds hierarchical feature maps by merging image patches (shown in gray) in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window (shown in red). It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. In contrast, previous vision Transformers produce feature maps of a single low resolution and have quadratic computation complexity to input image size due to computation of self-attention globally.

SENet pioneered channel attention. The core of SENet is a squeeze-and-excitation (SE) block which is used to collect global information, capture channel-wise relationships and improve representation ability.
SE blocks are divided into two parts, a squeeze module and an excitation module. Global spatial information is collected in the squeeze module by global average pooling. The excitation module captures channel-wise relationships and outputs an attention vector by using fully-connected layers and non-linear layers (ReLU and sigmoid). Then, each channel of the input feature is scaled by multiplying the corresponding element in the attention vector. Overall, a squeeze-and-excitation block $F_\text{se}$ (with parameter $\theta$) which takes $X$ as input and outputs $Y$ can be formulated 
as:
\begin{align}
    s = F_\text{se}(X, \theta) & = \sigma (W_{2} \delta (W_{1}\text{GAP}(X)))
\end{align}
\begin{align}
    Y = sX
\end{align}

Channel attention

Squeeze-and-Excitation Networks

Swin Transformer

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

**Geomancer** is a nonparametric algorithm for symmetry-based disentangling of data manifolds. It learns a set of subspaces to assign to each point in the dataset, where each subspace is the tangent space of one disentangled submanifold. This means that geomancer can be used to disentangle manifolds for which there may not be a global axis-aligned coordinate system.

Source	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com