XGPT is a method of cross-modal generative pre-training for image captioning designed to pre-train text-to-image caption generators through three novel generation tasks, including image-conditioned masked language modeling (IMLM), image-conditioned denoising autoencoding (IDA), and text-conditioned image feature generation (TIGF). The pre-trained XGPT can be fine-tuned without any task-specific architecture modifications and build strong image captioning models.

**BinaryBERT** is a [BERT](https://paperswithcode.com/method/bert)-variant that applies quantization in the form of weight binarization. Specifically, ternary weight splitting is proposed which initializes BinaryBERT by equivalently splitting from a half-sized ternary network. To obtain BinaryBERT, we first train a half-sized [ternary BERT](https://paperswithcode.com/method/ternarybert) model, and then apply a [ternary weight splitting](https://paperswithcode.com/method/ternary-weight-splitting) operator to obtain the latent full-precision and quantized weights as the initialization of the full-sized BinaryBERT. We then fine-tune BinaryBERT for further refinement.

BinaryBERT

BinaryBERT: Pushing the Limit of BERT Quantization

XGPT

XGPT: Cross-modal Generative Pre-Training for Image Captioning

**Cutout** is an image augmentation and regularization technique that randomly masks out square regions of input during training. and can be used to improve the robustness and overall performance of convolutional neural networks. The main motivation for cutout comes from the problem of object occlusion, which is commonly encountered in many computer vision tasks, such as object recognition,
tracking, or human pose estimation. By generating new images which simulate occluded examples, we not only better prepare the model for encounters with occlusions in the real world, but the model also learns to take more of the image context into consideration when making decisions

Source	XGPT: Cross-modal Generative Pre-Training for Image Captioning
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com