AICurious Logo

What is: ConvBERT?

SourceConvBERT: Improving BERT with Span-based Dynamic Convolution
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

ConvBERT is a modification on the BERT architecture which uses a span-based dynamic convolution to replace self-attention heads to directly model local dependencies. Specifically a new mixed attention module replaces the self-attention modules in BERT, which leverages the advantages of convolution to better capture local dependency. Additionally, a new span-based dynamic convolution operation is used to utilize multiple input tokens to dynamically generate the convolution kernel. Lastly, ConvBERT also incorporates some new model designs including the bottleneck attention and grouped linear operator for the feed-forward module (reducing the number of parameters).