AICurious Logo

What is: ConViT?

SourceConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

ConViT is a type of vision transformer that uses a gated positional self-attention module (GPSA), a form of positional self-attention which can be equipped with a “soft” convolutional inductive bias. The GPSA layers are initialized to mimic the locality of convolutional layers, then each attention head is given the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information.