AICurious Logo

What is: Graph Self-Attention?

SourceBP-Transformer: Modelling Long-Range Context via Binary Partitioning
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Graph Self-Attention (GSA) is a self-attention module used in the BP-Transformer architecture, and is based on the graph attentional layer.

For a given node uu, we update its representation according to its neighbour nodes, formulated as h_uGSA(G,hu)\mathbf{h}\_{u} \leftarrow \text{GSA}\left(\mathcal{G}, \mathbf{h}^{u}\right).

Let A(u)\mathbf{A}\left(u\right) denote the set of the neighbour nodes of uu in G\mathcal{G}, GSA(G,hu)\text{GSA}\left(\mathcal{G}, \mathbf{h}^{u}\right) is detailed as follows:

Au=concat({h_vvA(u)})\mathbf{A}^{u} = \text{concat}\left(\{\mathbf{h}\_{v} | v \in \mathcal{A}\left(u\right)\}\right)

Qu_i=H_kWQ_i,K_iu=AuWK_i,Vu_i=AuW_iV\mathbf{Q}^{u}\_{i} = \mathbf{H}\_{k}\mathbf{W}^{Q}\_{i},\mathbf{K}\_{i}^{u} = \mathbf{A}^{u}\mathbf{W}^{K}\_{i},\mathbf{V}^{u}\_{i} = \mathbf{A}^{u}\mathbf{W}\_{i}^{V}

headu_i=softmax(Qu_iK_iuTd)V_iu\text{head}^{u}\_{i} = \text{softmax}\left(\frac{\mathbf{Q}^{u}\_{i}\mathbf{K}\_{i}^{uT}}{\sqrt{d}}\right)\mathbf{V}\_{i}^{u}

GSA(G,hu)=[headu_1,,headu_h]WO \text{GSA}\left(\mathcal{G}, \mathbf{h}^{u}\right) = \left[\text{head}^{u}\_{1}, \dots, \text{head}^{u}\_{h}\right]\mathbf{W}^{O}

where d is the dimension of h, and WQ_i\mathbf{W}^{Q}\_{i}, WK_i\mathbf{W}^{K}\_{i} and WV_i\mathbf{W}^{V}\_{i} are trainable parameters of the ii-th attention head.