AICurious Logo

What is: Cross-Attention Module?

SourceCrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

The Cross-Attention module is an attention module used in CrossViT for fusion of multi-scale features. The CLS token of the large branch (circle) serves as a query token to interact with the patch tokens from the small branch through attention. f()f\left(·\right) and g()g\left(·\right) are projections to align dimensions. The small branch follows the same procedure but swaps CLS and patch tokens from another branch.