What is: Bort?
Source | Optimal Subarchitecture Extraction For BERT |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
Bort is a parametric architectural variant of the BERT architecture. It extracts an optimal subset of architectural parameters for the BERT architecture through a neural architecture search approach; in particular, a fully polynomial-time approximation scheme (FPTAS). This optimal subset - “Bort” - is demonstrably smaller, having an effective size of the original BERT-large architecture, and of the net size. Bort is also able to be pretrained in GPU hours, which is less than the time required to pretrain the highest-performing BERT parametric architecture variant, RoBERTa-large (RoBERTa), and about $33%