AICurious Logo

What is: MacBERT?

SourceRevisiting Pre-Trained Models for Chinese Natural Language Processing
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

MacBERT is a Transformer-based model for Chinese NLP that alters RoBERTa in several ways, including a modified masking strategy. Instead of masking with [MASK] token, which never appears in the fine-tuning stage, MacBERT masks the word with its similar word. Specifically MacBERT shares the same pre-training tasks as BERT with several modifications. For the MLM task, the following modifications are performed:

  • Whole word masking is used as well as Ngram masking strategies for selecting candidate tokens for masking, with a percentage of 40%, 30%, 20%, 10% for word-level unigram to 4-gram.
  • Instead of masking with [MASK] token, which never appears in the fine-tuning stage, similar words are used for the masking purpose. A similar word is obtained by using Synonyms toolkit which is based on word2vec similarity calculations. If an N-gram is selected to mask, we will find similar words individually. In rare cases, when there is no similar word, we will degrade to use random word replacement.
  • A percentage of 15% input words is used for masking, where 80% will replace with similar words, 10% replace with a random word, and keep with original words for the rest of 10%.