AICurious Logo

What is: Bort?

SourceOptimal Subarchitecture Extraction For BERT
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Bort is a parametric architectural variant of the BERT architecture. It extracts an optimal subset of architectural parameters for the BERT architecture through a neural architecture search approach; in particular, a fully polynomial-time approximation scheme (FPTAS). This optimal subset - “Bort” - is demonstrably smaller, having an effective size of 5.5%5.5 \% the original BERT-large architecture, and 16%16\% of the net size. Bort is also able to be pretrained in 288288 GPU hours, which is 1.2%1.2\% less than the time required to pretrain the highest-performing BERT parametric architecture variant, RoBERTa-large (RoBERTa), and about $33%