AICurious Logo

What is: Ternary Weight Splitting?

SourceBinaryBERT: Pushing the Limit of BERT Quantization
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

Ternary Weight Splitting is a ternarization approach used in BinaryBERT that exploits the flatness of ternary loss landscape as the optimization proxy of the binary model. We first train the half-sized ternary BERT to convergence, and then split both the latent full-precision weight wt\mathbf{w}^{t} and quantized w^t\hat{\mathbf{w}}^{t} to their binary counterparts w_1b,w_2b\mathbf{w}\_{1}^{b}, \mathbf{w}\_{2}^{b} and w^_1b,w^_2b\hat{\mathbf{w}}\_{1}^{b}, \hat{\mathbf{w}}\_{2}^{b} via the TWS operator. To inherit the performance of the ternary model after splitting, the TWS operator requires the splitting equivalency (i.e., the same output given the same input):

wt=w_1b+w_2b,w^t=w^_1b+w^_2b\mathbf{w}^{t}=\mathbf{w}\_{1}^{b}+\mathbf{w}\_{2}^{b}, \quad \hat{\mathbf{w}}^{t}=\hat{\mathbf{w}}\_{1}^{b}+\hat{\mathbf{w}}\_{2}^{b}

While solution to the above equation is not unique, we constrain the latent full-precision weights after splitting w_1b,w_2b\mathbf{w}\_{1}^{b}, \mathbf{w}\_{2}^{b} to satisfy wt=w_1b+w_2b\mathbf{w}^{t}=\mathbf{w}\_{1}^{b}+\mathbf{w}\_{2}^{b}. See the paper for more details.