What is: TrOCR?
Source | TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models |
Year | 2000 |
Data Source | CC BY-SA - https://paperswithcode.com |
TrOCR is an end-to-end Transformer-based OCR model for text recognition with pre-trained CV and NLP models. It leverages the Transformer architecture for both image understanding and wordpiece-level text generation. It first resizes the input text image into and then the image is split into a sequence of 16 patches which are used as the input to image Transformers. Standard Transformer architecture with the self-attention mechanism is leveraged on both encoder and decoder parts, where wordpiece units are generated as the recognized text from the input image.