AICurious Logo

What is: TrOCR?

SourceTrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Year2000
Data SourceCC BY-SA - https://paperswithcode.com

TrOCR is an end-to-end Transformer-based OCR model for text recognition with pre-trained CV and NLP models. It leverages the Transformer architecture for both image understanding and wordpiece-level text generation. It first resizes the input text image into 384×384384 × 384 and then the image is split into a sequence of 16 patches which are used as the input to image Transformers. Standard Transformer architecture with the self-attention mechanism is leveraged on both encoder and decoder parts, where wordpiece units are generated as the recognized text from the input image.