**Semantic reasoning network**, or **SRN**, is an end-to-end trainable framework for scene text recognition that consists of four parts: backbone network, parallel [visual attention](https://paperswithcode.com/method/visual-attention) module (PVAM), global semantic reasoning module (GSRM), and visual-semantic fusion decoder (VSFD). Given an input image, the backbone network is first used to extract 2D features $V$. Then, the PVAM is used to generate $N$ aligned 1-D features $G$, where each feature corresponds to a character in the text and captures the aligned visual information. These $N$ 1-D features $G$ are then fed into a GSRM to capture the semantic information $S$. Finally, the aligned visual features $G$ and the semantic information $S$ are fused by the VSFD to predict $N$ characters. For text string shorter than $N$, ’EOS’ are padded.

Pathology Language and Image Pre-Training (PLIP) is a vision-and-language foundation model created by fine-tuning CLIP on pathology images.

PLIP

Leveraging medical Twitter to build a visual–language foundation model for pathology AI

Semantic Reasoning Network

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks

**SMITH**, or **Siamese Multi-depth Transformer-based Hierarchical Encoder**, is a [Transformer](https://paperswithcode.com/methods/category/transformers)-based model for document representation learning and matching. It contains several design choices to adapt [self-attention models](https://paperswithcode.com/methods/category/attention-modules) for long text inputs. For the model pre-training, a masked sentence block language modeling task is used in addition to the original masked word language model task used in [BERT](https://paperswithcode.com/method/bert), to capture sentence block relations within a document. Given a sequence of sentence block representation, the document level Transformers learn the contextual representation for each sentence block and the final document representation.

Source	Towards Accurate Scene Text Recognition with Semantic Reasoning Networks
Year	2000
Data Source	CC BY-SA - https://paperswithcode.com