**Gated Positional Self-Attention (GPSA)** is a self-attention module for vision transformers, used in the [ConViT](https://paperswithcode.com/method/convit) architecture, that can be initialized as a convolutional layer -- helping a ViT learn inductive biases about locality.

Optimize combinations of various neural network models for multimodal data with bayseian optimization.

HyperTree MetaModel

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

GPSA

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

**Polyak Averaging** is an optimization technique that sets final parameters to an average of (recent) parameters visited in the optimization trajectory. Specifically if in $t$ iterations we have parameters $\theta\_{1}, \theta\_{2}, \dots, \theta\_{t}$, then Polyak Averaging suggests setting 

$$ \theta\_t =\frac{1}{t}\sum\_{i}\theta\_{i} $$

Image Credit: [Shubhendu Trivedi & Risi Kondor](https://ttic.uchicago.edu/~shubhendu/Pages/Files/Lecture6_flat.pdf)

Source	ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com