What is: MPNet?

MPNet is a pre-training method for language models that combines masked language modeling (MLM) and permuted language modeling (PLM) in one view. It takes the dependency among the predicted tokens into consideration through permuted language modeling and thus avoids the issue of BERT. On the other hand, it takes position information of all tokens as input to make the model see the position information of all the tokens and thus alleviates the position discrepancy of XLNet.

The training objective of MPNet is:

$\mathbb{E}\_{z\in{\mathcal{Z}\_{n}}} \sum^{n}\_{t=c+1}\log{P}\left(x\_{z\_{t}}\mid{x\_{z\_{<t}}}, M\_{z\_{{>}{c}}}; \theta\right)$

As can be seen, MPNet conditions on ${x\_{z\_{<t}}}$ (the tokens preceding the current predicted token $x\_{z\_{t}}$ ) rather than only the non-predicted tokens ${x\_{z\_{<=c}}}$ in MLM; comparing with PLM, MPNet takes more information (i.e., the mask symbol $[M]$ in position $z\_{>c}$ ) as inputs. Although the objective seems simple, it is challenging to implement the model efficiently. For details, see the paper.

Source	MPNet: Masked and Permuted Pre-training for Language Understanding
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com